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ABSTRACT 


Manning United States Army Reserve (USAR) units is fundamentally different 
from manning Regular Army (RA) units. A soldier assigned to a USAR unit must live 
within 75 miles or 90 minutes commute of his Reserve Center (RC). This makes reserve 


unit positioning a key factor in the ability to recruit to fill the unit. 


This thesis automates, documents, reconciles, and assembles data on over 30,000 
ZIP Codes, over 800 RCs, and over 260 Military Occupational Specialties (MOSs), 
drawing on and integrating over a dozen disparate databases. This effort produces a 
single data file with demographic, vocational, and economic data on every ZIP Code in 
America, along with the six year results of its RA, USAR, sister service recruit 


production, and MOS suitability for each of the 264 MOSs. 


Preliminary model development accounts for about 70% recruit production 
variation by ZIP Code. This thesis also develops models for the top five MOSs to predict 
the maximum number of recruits obtained from a ZIP Code for that MOS. Examples 
illustrate that ZIP Codes vary in their ability to provide recruits with sufficient aptitude 


for technical fields. 


Two subsequent theses will use those results. One completes the MOS models. 
The second uses the models as constraints in an optimization model to position RCs. An 


initial version of the optimization model is developed in this thesis. 


Together, the three theses will provide a powerful tool for analysis of a strategic- 


based optimal reserve force stationing. 
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DISCLAIMER 


The views expressed in this document are those of the author. They do not reflect 
official policy, regulatory requirement, position of Department of Defense, or the 
government. The reader should understand the author decided discrepancies of data 
elements. While every effort was taken to ensure error-free computer programs, logic, 
and computational errors, the reader must use the information with discretion. Any 
usage, beyond the scope of this thesis, 1s done at risk of the user, and validation should be 
made prior to future application. The terms he, him, or himself refers to both genders 


masculine and feminine. 
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EXECUTIVE SUMMARY 


Trained and ready units are the key to the success of America’s Armed Forces. 
The drawdown of United States Armed Forces over the past decade and a half causes 
great reliance on the Reserve Components. With this increased reliance, unit fill 
becomes increasingly important to unit deployment schedules and Homeland Security. 
Unfilled units degrade personnel and training readiness. This thesis develops a three- 


phase modeling process that will greatly assist with the analysis of this readiness issue. 


Manning United States Army Reserve (USAR) units is fundamentally different 
than manning Regular Army (RA) units. A soldier assigned to a USAR unit must live 
within 75 miles or 90 minutes of his Reserve Center (RC). This makes USAR unit 


positioning a key factor in the ability to recruit to fill the unit. 


This model addresses this problem by looking at specific demographic, 
vocational, and other ZIP Code factors of interest. This thesis 1s Phase I of a three theses 


effort to address this problem. These three phases are: 
Phase I: Process Definition, Data Collection, and Data Scrubbing. 
Phase II: MOS Build — Populate Data Fields for the Optimization Model. 
Phase III: Construct and Complete the Optimization Model. 


Since the entire model is a huge undertaking, the focus of this thesis 1s Phase I. Prior to 
an analysis, data collection and data scrubbing take an enormous amount of time and 
effort. In this thesis, we assemble the data on over 30,000 ZIP Codes, over 800 RCs, and 
over 260 Military Occupational Specialties (MOSs), drawing on and integrating over a 
dozen disparate data bases. Phase I is an exercise in data mining, data manipulation, data 


acquisition, and data sourcing identification. 


This effort produced a single table with demographic, vocational, and economic 
data on every ZIP Code in America, along with the six-year results of RA, USAR, and 
Sister Service recruit production. Data was also obtained on the quality of each recruit 


and his suitability for each of the 264 Army MOSs. 
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Preliminary modeling developed a model that accounts for about 70% of the 
variation in recruit production by ZIP Code. Models for the top five USAR MOSs were 
also developed to predict the maximum number of recruits obtained from a ZIP Code for 
that MOS. ZIP Codes vary in their ability to provide recruits with sufficient aptitude for 
technical fields, and this is illustrated in this thesis with examples. This modeling gives 
new explanatory and predictive capability. Surprisingly, unemployment rates had a small 
inverse effect on these five models. The unemployment rate is statistically significant, 


but may not be practically significant. 


The second thesis in the series will develop models for all 264 MOSs and analyze 
them for commonalities and differences that reveal insights about recruit production for 
the USAR. This will also identify the regional propensity of the market to join the 
USAR. The third thesis will use those models as constraints in a mixed integer linear 
program that positions the RCs to maximize their ability to man their units. The 
assignment of RC market ZIP Codes to maximize unit fill rates leads to increased unit 


readiness. This thesis creates an initial version of this program. 


This thesis automates the process of assembling and reconciling key data files 
using a commercial data-mining package called Clementine. We document that process 
so that future analysts can avoid the near three man-months of work to create an updated 


master data file with its over 30,000 by 430 cells. This is a major contribution. 


These results support the solution of the unit fill rate problem and address many 
of the issues associated with determining the appropriate demographic, economic, and 
vocational factors of RC markets. Together these three theses will provide a powerful 
tool for analysis of optimal reserve force stationing. This will greatly improve the 
readiness of the Reserve Components, unit deployment schedules, and Homeland 


Security. 
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I. INTRODUCTION, BACKGROUND, AND SOURCE 


A. INTRODUCTION 


Trained and ready units have been the key to success of America’s Armed Forces. 
Without a trained and ready force, we cannot support and defend the nation. The Army 
Accession Command has the responsibility to fill the ranks of the Army. One of its 
subordinate units is the United States Army Recruiting Command (USAREC). USAREC 
has the responsibility to achieve both its Regular Army (RA) and United States Army 
Reserve (USAR) annual accession missions. Without soldiers, the armed forces cannot 
begin to be ready and trained. Unit fill is the first step in achieving ready, trained, and 


deployable units. 


This thesis focuses on recruitment quality and unit placement, with respect to the 
population, to meet force structure objectives. This thesis develops a model to analyze 
the complex process of filling the USAR Troop Program Unit (TPU) vacancies. The 
model determines factors associated with unit fill rates by Military Occupational 
Specialty (MOS). The model looks at unit positioning, assesses the quality of potential 
recruits, and includes demographic considerations to determine potential success in the 


market by MOS. The MOS fill rate 1s as follows: 


ON = HAN. D MOS& SKILL _ LEVEL 


FILL _RA Lopes SKIRG DEVEL gare ete 
= AUTHORIZED vose skint _ LEVEL 


Equation 1.1: MOS Fill Rate Equation 


For example, if we have a unit with 15 63B10 (skill level 1) authorizations and 5 63B20 
(skill level 2) authorizations and it had 10 63B10s on-hand and 3 63B20s on-hand, the fill 
rate for skill level 1 and 2 63Bs would be 0.667 and 0.600, respectively. Modeling the 
process of filling unit vacancies will greatly assist in accessing the requisite number of 


young men and women soldiers for America’s Army. 


I call this model the Unit Positioning and QUality Assessment Model 
(UPQUAM). UPQUAM is a marketing and enlisted quality assessment tool used for 
conducting strategic USAR unit positioning and quality assessment for USAREC and the 
United States Army Reserve Command (USARC). 


The USARC consists of 10 Regional Support Commands (RSCs) comprising over 
4200 individual TPUs plus 4 other Army Commands (ARCOMs) to support its mission 
responsibilities. For National Security and Homeland Defense, these RSCs are aligned 


with the Federal Emergency Management Areas (FEMAs). 


The location of US Armed Forces Reserve units plays an important role in 
Homeland Security issues as well as National defense posturing for success on the 
battlefield. Note that the former Continental United States Armies (CONUSAs) have 
been realigned with the FEMAs. The reason was to provide a support infrastructure for 
Homeland Defense in each FEMA. This analysis examines the relation between unit 
location and recruiting success. We desire to consider how to maximize the fill rate of 


USAR units through regression and optimization. 


The model takes as inputs USAR unit structure, location, and historical quality of 
enlistment contracts. It uses a threshold value, for each MOS, based on Armed Forces 
Scoring Vocational Aptitude Battery (ASVAB) Line Score Categories (LSCATs). There 
are ten LSCATs which determine the minimum requirements for obtaining or qualifying 
for a particular MOS. The average LSCATs for each ZIP Code and Reserve Center (RC) 
will determine the type(s) of MOSs supported by the population surrounding the RC. 


This thesis models the number of recruits a ZIP Code should produce, and the 
maximum number of recruits with sufficient skills for each MOS. This is a necessary 
input to the UPQUAM model, which will be completed in a subsequent thesis. The 
combined analysis will give insight as to the proper districting of RC areas, a specific 
location for USAR units throughout the US. The analysis illustrates the issues associated 


with unit vacancy fill problem of TPUs in the USAR. 


B. BACKGROUND 


One of many missions of the USAR is to recruit to fill its ranks. USAREC 
administers this responsibility by recruiting, assessing, and accessioning to fill USAR 
TPUs. Maintenance of quality soldiers for the USAR is a TPU responsibility. The 
recruiters’ mission greatly hinges on the ability of the market (the population) to support 


the USAR units in their respective locations. 


Filling RA and USAR units requires different approaches. Recruits filling RA 
units are accessed, attend training, and then are sent to their units worldwide without 
respect to their place of entry. USAR units are, normally, filled by personnel recruited 
within 75 miles or 90 minutes commuting time. This constraint is imposed to reduce the 
financial burden on soldiers. This geographical limitation, at times, may hamper unit fill. 
This occurs because personnel necessary to fill the unit are taken from a geographical 
region and there may or may not be sufficient numbers of qualified personnel in the 


region suited to join the units. 


This analysis focuses on USAR force structure and the geographical constraints 
placed on units with respect to the local population. Filling unit vacancies comes at a 
price. Historically, fill rates of units (the percentage of required personnel in certain 


geographical locations) have not been at appropriate readiness levels. 


There are two sets of qualified applicants, Prior Service (PS) and Non-Prior 
Service (NPS) personnel. These two pools of personnel form the available population. 
The Army considers the Military Available (MA) population those individuals aged 17- 
29.5 who are mentally, morally, and medically qualified for military service. The NPS 


set is those individuals aged 17-21 and the PS set is those individuals aged 22-29.5. 


The USAR is ultimately responsible for filling its ranks. However, USAREC is 
responsible for recruiting the NPS set and the USAR is responsible for the PS set. PS 
personnel, as the name indicates, have previously served. To administer the PS 
responsibility, the USAR maintains a database of qualified soldiers to deploy when 


needed. The motivation for PS personnel to stay is greatly influenced by their respective 


unit experiences. Since this is a TPU responsibility and not a focus of this study, we will 


not consider the PS set. 


Instead, the analysis focuses on the NPS set. This is the harder set for data 
assembly and analysis. Recruitment for a particular position is based on its vacancy. 
Readiness, as previously stated, is a function of personnel. To have ready and trained 
units, the USAR must first train the personnel it recruits to perform specific tasks or 
missions. Recruits must have sufficient aptitude to be task trained, and are tested to see if 


they do. 


The collection of skills for a position has an associated MOS. Soldiers receive 
MOS training in two phases. The first phase, indoctrination, is called Basic Training 
(BT). This is where soldiers receive training in basic combat skills. The second phase, 
the skill set for an MOS, 1s called Advanced Individual Training (AIT). Each position in 
a unit has an associated MOS and experience levels. Not all vacant positions in a unit are 
at a novice level. As a soldier gains experience and expertise, he becomes responsible for 


additional skills within his MOS. 


The unique challenge for the USAR is the traveling constraint for unit personnel 
reporting for duty. As previously stated, this limit is currently 75 miles or 1.5 hours 
commuting time to the unit. Commuting distance for a unit headquartered in rural areas 
differs from those in suburban areas because of traffic. It may take just as much time to 
travel 25 miles in suburban areas as it does to travel 75 miles in rural areas. Therefore, 


geographical location of units with respect to the population is a major consideration. 


Personnel with different skills may be more apt to join units demanding these 
skills. The Bureau of Labor & Statistics (BLS) and the United States Bureau of the 
Census (USBC) collects data about vocational aptitudes. This thesis considers eleven 
different vocational categories for the workforce. There is a clustering of USAR MOSs 
to these eleven vocational categories. We determine the inclusion of these vocational 


categories as we conduct a regression analysis. 


Currently, the types and markets of some units do not align. Some local markets 
cannot adequately support the unit requirements. This is cause for concern, especially if 


the unit has a high priority for deployment. Unit fill is essential for readiness. Improving 
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the unit location with respect to the local market may make more effective use of the MA 
population. TPU alignment within its respective market should be such that the 
recruiting mission 1s attainable. TPU structure positioned to draw on the local vocations 
is one way to accomplish the recruiting mission. An extension of our model allows an 


optimal RC unit-stationing plan, and I discuss this in Chapter V. 


The primary purpose of this thesis is to determine which demographic factors 
affect unit fill rates. There may be several causes for the lack of unit fill over time such 
as unit attrition, unit climate, market, recruitment efforts, population demographics, 
unemployment rates, quality, mission goals, or other factors. It is the responsibility of 
both USAREC and the USARC to determine what they individually and jointly can do 


about the lack of fill. These unit fill rates are key inputs into the larger position problem. 


Insufficient unit fill itself gives no indication as to specific causes. If unit 
shortages are left unattended, the results can be devastating to Homeland and National 
Security. Currently policy and regulatory requirements incorporate some methods to 
relocate and reposition structure. There is a need for additional methods and policy to 
ensure unit fill. If this analysis proves beneficial, the Chief, Army Reserve (CAR) and 
USARC Force Structure personnel should adopt a strategy of repositioning structure in 


accordance with this analysis. 


C. PROBLEM AND SOURCE 


1. Underlying Problem 
The complexity of this problem is too vast for one thesis. To manage the process, 


I will break it down into three components. 


1. Phase I: Process & Model Definition, Data Collection, and Data 


Scrubbing. 
2, Phase II: MOS Build — Populate Data Fields for the Optimization Model. 
3. Phase III: Construct and Complete the Optimization Model. 


Phase I 1s the focus of this thesis. The Linear Program (LP) or Non-Linear Program 


(NLP) that will eventually complete this process will consist of data, variables, an 
5 


objective function, and constraint set sections. We will define a preliminary optimization 


model in this thesis and capture the necessary data elements. I will also summarize a 


great deal of the constraint set. The eventual optimization model should consist of and 


resemble the following: 


INDICES and SETS: 


i 
J 
k 


PARAMETERS: 
max_recruit ZIp; 
max_recruit_Zip_Mos ;; 
target_MOS_IC;x 


zip_rc_dist;, 


Zip _rc_time; x 
weight unit, 


weight_mos; 


max _flow 


ZIP Code of interest (00010...99985) [1,...,10°] 
MOS of interest (OOB...98Z) [1,...,264] 
Reserve Center (The current number of RCs) [1,...,829] 


Maximum number of recruits obtained at Zip i | 
Maximum number of recruits obtained at Zip i of MOS ;” 
Target MOS j at RC k 


1 If Zip is within 75 miles of RC 
0 o/w 
, If Zip is within 1.5 hours of RC 


0 o/w 


Weighting (priority) of unit at RC k assigned by OCAR [tier 1 
= |, tier 2A = 2, tier 2B = 3, tier 3 = 4, tier 4=5, tier 5 = 6] 

Weighting (priority) of MOS 7 assigned by OCAR [Top 15 = 
1,2, ..., 15; All others = 16] ° 


Maximum Flow from any ZIP-RC arc 


VARIABLES (Note: All variables are non-negative): 


FLOW; ;, 

ZIP_RC ix 
FILL_MOS_ RC}, 
OVER _MOS_RC,x 
UNDER_MOS_RC x 


FORMULATION: 


Flow from ZIP Code i to MOS j to RC k 
1 Jf Zip is in RC market 


0 o/w 


Fill of MOS 7 at RC k 
Number personnel over 100% fill of MOS 7 at RC k 
Number personnel under 100% fill of MOS 7 at RC k 


MIN ) WEIGHT_RC,| ), WEIGHT __MOS,*UNDER_MOS_RC,, 


k 


a 


s.(1) ) > FLOW, ,, < MAX _RECRUIT _ZIP Vi 
ik 
(2) FLOW, jx < MAX _RECRUIT _ZIP _ MOS, , Vij 
(3) ZIP_RC, <1 Vi 
(4) FLOW, < ZIP_RC,,* MAX _FLOW V ijk 


6 


(5) ZIP_RC,,<ZIP_RC_DIST,, V ik 
(6) ZIP_RC,,<ZIP_RC_TIME,, V ik 
(1) ) FLOW, ,,,=FILL_MOS_RC,, V jk 


(8) FILL_MOS_RC,,-OVER_MOS_RC,, 
+UNDER _MOS_RC,,=TARGET _MOS_RC,, Vik 


' max_recruit zip; » (demographic factors) 
* max_recruit_mos_ zip; =» g(demographic ZIP Code factors) 
* May consider regionalization of MOS priority 


Constraints | and 2 above are formulated by using the methods of this thesis. 
Variable construction in this manner provides control of the MA population in the ZIP 
Code. Note that some ZIP Codes are larger than others. The objective function 
minimizes the shortages of personnel by MOS, weighting each MOS, and weighting RCs 
by priority. The optimization distribution model depends on the outcome of the findings 
of the MOS Build in Phase II. The outcome of the specific MOS analysis will determine 


the actual model form. Programming the constraints achieves the following: 
1. Limits the number of recruits per ZIP Code to its maximum level; 


2. Limits the number of recruits in a given MOS per ZIP Code to its maximum 


level; 


3. Limits each ZIP Code to at most one RC or a separate ZIP Code distribution 


plan to share market ZIP Codes (this feature can be relaxed); 
4. Forces flow from a ZIP Code outside its allowed RCs to zero; 
5. Excludes ZIP Codes from RCs that are too far (distance); 
6. Excludes ZIP Codes from RCs that are too far (time); 
7. Balance equation showing personnel assigned by MOS in an RC; 


8. Balance equation for Fill, Target, Over, and Under constraints. 


This thesis determines the bounds for the constraints of type | and 2. 


This formulation assigns ZIP Codes to RCs. A subsequent formulation will 
assign RCs to a given ZIP Code, and the other ZIP Codes to that RC. By changing the 
units assigned to a given RC, target_mos_rc;, changes. This allows exploration of 


different assignment of units to existing RCs. This, too, can be explored in Phase III. 


Unfilled unit positions hurt the readiness and training of USAR units. Unit 
positioning with respect to the population has also been a long-term problem. Finding an 
adequate number of high-quality recruits has also been a problem for units with positions 
requiring higher MOS ASVAB line scores. The development of a unit positioning and 
quality assessment tool will greatly assist unit fill and retention rates. This three-phase 
model will provide insights and help solve one of the most complex problems facing the 
USAR. It will involve the development of several tools and analyses. Once complete, it 


will greatly improve OCAR’s ability to manage the reserve force. 


Di Source 

Finding a single cause of TPU unfilled vacancies 1s very difficult. Historical fill 
and retention rates of USAR TPUs in their respective geographical locations may give 
insight as to potential reasons. To study the system we need to determine factors 
associated with inability to fill TPU vacancies. There are several reasons for the inability 


to fill the units, and unit location may prove to be most significant. 


Figure 1.1 shows the actual USAR TPU locations. There are 829 Reserve Center 
(RC) stations housing more than 4,200 units. Historically, a unit’s actual geographical 
location is associated with unit fill rates (USAREC, National Market Analysis (NMA), 
2000). Not having sufficient numbers of qualified military recruits available in a market 


(population) definitely influences the fill rate of a unit and its readiness. 


The USAR currently adopts a policy of relocating units having fill problems by 
use of Market Supportability Studies (MSSs) provided by USAREC. This has proven 
beneficial over time. As the USAR relocates units into better markets, unit fill rates have 
increased. However, the MSSs provided by USAREC consider only the volume metric 
for the population. This analysis considers not only the volume but also market quality 


and vocation. 


USAR TPU Locations 
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Figure 1.1: USAR TPU Locations 


The overlay of Figure 1.1 and Figure 1.2 demonstrates that USAREC recruiting 
station locations are often in close proximity to TPUs. Each unit has many MOSs 


USAREC attempts to fill. The national fill priority for MOSs takes precedence over 


USAREC Recruiting Station Locations 
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Figure 1.2: USAREC Recruiting Station Locations 


locally needed MOSs. Some of the problems causing poor fill rates may be TPU 


attrition, recruiting difficulties pertaining to unit stationing and resources, the draw-down 
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of US Forces, local economic situations, structure changes associated with changing 
missions, TPU deployments, and the competition associated with population vocational 
availability. Other problems include education and skill training availability, job market, 


economy, unemployment rates, and sister service competition. 


Figure 1.3 demonstrates that TPUs are properly located near the MA population. 
These are Hot Spot Projection maps. They are thematic mappings of population 
information buffered on aspect intervals (75 mile radii) using the 2000 US Census data 


for the MA population. 


These maps demonstrate coverage and stationing of both USAR TPUs and 
USAREC recruiting stations with respect to the markets. The upper left graphic shows 
the actual placement of USAREC recruiting stations, while the upper right graphic shows 
the actual USAR TPU placement in the market. The lower graphic is an overlay of both 
the stations and TPUs with respect to the markets. This graphically demonstrates the 


recruiting coverage for the TPUs 


With a few exceptions, Figure 1.3 strongly suggests the recruiting stations are 
properly aligned with TPU locations in the market. USAREC Marketing personnel 
carefully review these exceptions and make minor adjustments to station recruiting 
missions for TPU coverage. This information and the manner in which USAREC 
conducts its mission and market planning to provide coverage for the TPUs, along with 
provisions for high priority TPUs, suggests that TPUs are located with respect to the 


market. 


Although unit locations appear to be aligned with the population, it is possible 
that TPU force structure may be misaligned within their respective markets. Looking at 
the vocational aspects of the market may shed light on this consideration. The type of 
employment available in geographical locations affects personnel availability for unit fill. 
The analogy for the argument is that if a steel manufacturing plant 1s to be built in a 
particular location, 1t requires sufficient personnel, within commuting distance and with 


certain vocational skills, to operate the facility. The unit fill potential 1s the extent to 


10 





S19} U9) 9A1IS9}] HVS /] Pub SU00RIS SUYINAIIY JAYHVS!) 


AOYIND AY], :adAnosy 


§$.19}U9) 9AIISIY WVSN 


juowubijy joyeW YVSN 


sUOHRIS SUYINAIDY OANVSN 


O01 
O02 
OOF 
O03 
O02 # 
6Z-LI pasy ‘uonrndog pajysafoig 





USAR Market Alignment — Hot Spot Projection Map 


Figure 1.3: 
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which a unit can expect to find the requisite number of skilled personnel in the local 
market. Because a unit should be close to the supporting population, the Army targets 
the recruitment of personnel to fill a unit based on the MA population within 75 miles or 
a 90 minute commute. Recruits may join a unit outside this range, but this is an 


exception to policy rather than the rule. 


The vocational support available to fill a unit’s vocational requirements can be 
determined by matching the unit’s MOSs to the local workforce’s vocational availability. 
This latter information is available from the Bureau of Labor and Statistics (BLS) by ZIP 
Code. With the BLS data, we can identify market vocations. We can specify the top 
eleven vocational aptitudes of the ZIP Code. We can then ascertain if there exist 


sufficient quantities of personnel available to fill unit vacancies. 


This is the reasoning behind the unit force structure breakout and stationing. A 
battalion may not be successful at a particular location, but a smaller company or platoon 
might. Regulations require the USARC to submit any proposed stationing actions or 
changes to USAREC. USAREC is then responsible for conducting a Market 
Supportability Study to ascertain the current force structure and determine if there 1s 


sufficient MA population to support any changes. 


Another tool assisting in this process is the Competitive Market Analysis — 
Reserve (CMA-R). The CMA-R reports the local market availability of US Army and 
sister service competition at an RC or other market levels. This tool enhances the 
USAREC’s ability to assist in market analysis by demonstrating what potential, if any, 


exists in the market. 


It may be beneficial to place our organizations in locations where the 
organization’s vocations are similar to those in the market. For example, assume we have 
a total of 1000 MA personnel for a particular RC, of whom 130 are identified as 
transportation workers. Suppose further that two local trucking firms employ 150 over- 
the-road and long-haul transportation workers. Rhetorically, where do we locate our 


units to draw on the market vocations? 


Would a Transportation Battalion (Medium/Heavy Transport), requiring 630 


personnel of whom 475 are actual truck drivers (MOS 88M) be successful in this 
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particular area? We need to know if there 1s other useful information available to the 
TPUs and Regional Support Commands (RSCs). Based solely on volume results of the 
MSS, we might conclude that it cannot be supported. But with the knowledge of local 
market vocations, our conclusion could be different. Knowing market vocations may 


assist in positioning units in those markets. 


Modeling the process of filling unit vacancies will greatly assist the recruiting 
efforts and TPU fill rates. There are several tools available to assist in unit fill. Existing 
tools are the NMA, MSS, and the CMA-R. The USAR cannot begin to be ready and 
trained without sufficient personnel. Determining factors associated with unit fill, unit 
positioning, quality assessment, and demographic considerations for potential success in 
meeting force structure objectives is the first step in achieving ready, trained, and 


deployable units. 


We want to position RCs to support recruitment for them. We hypothesize that 
recruitment is affected by demographics, vocational aptitude, and economy of the 
surrounding area. We want to model the recruiting potential by MOS and ZIP Code so 
we can enter this information as a constant in the optimization distribution LP model. To 
model recruitment potential by MOS and ZIP Code, we must mine several large 


incompatible databases to construct our data set. 


This data mining is an enormous task. We accomplish it, automate it, and 
document it. Using our data set, we illustrate the recruit potential model for 4 key MOSs. 
A second thesis can complete the recruit potential model for the other 260 MOSs and 
analyze the model set for commonalities and distributions. A third thesis can implement 


the full LP model and develop the optimal RC unit distribution plan. 
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Hl. SUPPORT, ISSUES, AND COURSE OF STUDY 


A. SUPPORT AND POSSIBLE CAUSES 


Analysis to support USAR TPU fill 1s ongoing. There are other tools used in 
conducting market research, operations analysis, and subsequent analysis of these items 
of interest. USAREC provides additional support for market analysis in other forms 
throughout the year. Some of the support includes the following: NMA, MSSs, CMA-R, 
Demographic Support (USAR Enhanced Applicant File), Market Research Tools, Market 
Estimates, Population Studies, Unit Attrition Studies, etc. If USAREC and the USARC 
do a good job in supporting the RSCs and TPUs, what is the cause of the unit fill problem 


experienced by some TPUs? 


The fundamental problem appears to be determining causes for the unit fill 
problem. Within this scope, how do we determine the appropriate markets for TPU 
structure? Trying to define “appropriate” among 10 RSCs and over 4,200 TPUs is 


challenging. What is considered appropriate for one may not be appropriate for the other. 


Previously, we saw Figure 1.1 depicting the actual unit locations of the CONUS 
USAR TPUs. There are significantly fewer than 4,200 TPU locations because multiple 
units can be housed at one location. Cost of facilities 1s a key factor. Therefore, many 
RC has multiple units stationed at its location. They may be grouped because they are 
similarly typed, have the same higher headquarters, have a similar mission area, etc. 
Army Regulations require unit stations be shared among several organizations. There are 


other factors influencing the outcome of unit stationing actions. 


Other influences include, for example, historical and political boundaries. 
Examples are units traditionally located in areas such as Philadelphia, Boston, or some 
other area of historical significance. Some politicians firmly believe their constituents 
want to have units stationed in their legislative districts because “the unit has always been 


here” or the local economy needs the payroll. 
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B. NON-DEMOGRAPHIC ISSUES AFFECTING UNIT FILL 


1. Considerations 

In this section, we list some issues affecting unit fill not included in the analysis 
of this thesis. Although we do not have data to conduct an analysis on the effects of 
enlistment inducements, it is important to mention them as part of the discussion of unit 
fill. Incentives may affect an applicant’s decision to join a unit when his original 
inclination was not to join or he wanted to choose another MOS that may not be available 


in a particular RC. 


A small discussion follows on policy options for the CAR to provide enlistment 
incentives to better penetrate and acquire the skills of the market. We will refer to this as 
regionalization. Regionalization also affects the market. Providing bonus or monetary 
incentives to the population 1s an enticement to enlistment. We use enlistment bonuses to 


entice recruitment. 


MOS bonus and educational incentive programs greatly affect unit fill. Offering 
incentives supports the national fill requirements by MOS. But unit geo-demographic 
considerations may have not been supported. It may prove beneficial to localize 
incentive programs thereby supporting the local commanders’ ability to offer bonus and 
incentives to fill particular MOS requirements not listed as part of the national priority of 
needs. For example, say MOS 88M (Transportation Specialist) is listed as one of the 
national priority MOSs, the top fifteen undermanned MOSs, to fill because of the 
collective fill rate of the MOS. However, it may not be the MOS needing to be filled in a 
particular region of the country. There may be a requirement to fill MOS 63B (Light 
Wheeled Vehicle Mechanic) in this area. It may prove beneficial to offer an incentive or 


bonus program for 63B as opposed to 88M is this particular region. 


Educational incentives may not be quite enough to convince an individual to join 
a unit for a particular needed MOS. However, having a regionally needed MOS 
associated bonus may be enough enticement for the same individual to enlist for the 
particular needed specialty. Otherwise the USAR might lose the individual to a sister 


service component which can satisfy the individual’s interest in a particular specialty. 
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Regional impacts are significant when considering the long-term effects of unit fill on 
readiness. There are several national programs and activities affected when USAR units 


are not filled. They include: 


PROGRAMS ACTIVITIES 

POM Projections Deployment Capabilities 

OMAR Funding Resources Unit Readiness 

CAR’s funding and resource allocation Media Attention 

level for successive fiscal years Force/Power Projection 

Enlisted Incentive Programs Capabilities 

Educational Incentive Programs Unit Leadership 
Training 


The dilemma is what to do about the regional performance of USAR TPUs. 
TPUs have the responsibility to train for war. Their preparedness is instrumental to the 
success of this nation to achieve its goals. Prioritization 1s paramount to achieving fill 
rate success. Priority units have fill priority. The following two areas need consideration 


as well: 
1. Regional needs by Area Support Group (ASG) or some other methodology. 


2. Incentive and bonus needs by ASG or some other methodology. 


Di Demographics and Unit Positioning Effects on Fill Rates 

The rationale for conducting this study is based on the principle of local 
demographic effects. Size, type, employment, vocations, education, and other factors 
affect local markets. Recall that the USAR has a geographical constraint limiting its 


market draw to the population within 75 miles or a 90 minute commute. 


We will demonstrate the affects of demographics. We hypothesize that the local 


employment or unemployment rate has an effect on the fill rates of units. 


Force structure composition in local markets is important to unit fill. We can see 
these effects if the population majority, in a particular area, is more likely to join a 


maneuver unit than a transportation unit. If the USAR places or has transportation force 
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structure in this area and the ARNG has armor or infantry force structure in the same 


location, transportation unit fill could suffer. 


Demographics and market composition must be addressed when deciding what 


force structure to place in a particular market. 


3. Deployment Tempo Inclusion 

The USAR TPU deployments have been on the rise in the last decade. Statistics 
indicate deployments are up 25% in the past decade. The USAR is being used at an 
increasing rate. However, at the time of this analysis, it was not feasible to obtain 
deployment data of USAR units. Deployment effects may not be seen until a few years 
after the unit redeploys to its home station. Further study in this area may reveal some 
peculiarities not yet discovered. Consideration of this topic should be included in further 


studies related to aspects of the unit fill problems. 


C: OBJECTIVES 


The overall project objective is to establish an optimization model for unit 
distribution by which to maximize unit fill in markets. The scope is limited by the ability 
to predict, forecast, or otherwise optimize the unit placement with respect to the 
population composition. The scope of this thesis 1s to define the process, define the 
optimization model, collect the data elements, and scrub these elements. This 
information will feed subsequent phases of the project, especially Phase Il. Recall that 
Phase II establishes the constraint set of the optimization distribution model to complete 


the analysis. 


The goal of this thesis is to identify the supportability of TPUs by the size and 
quality assessment of the population. To do that we draw the appropriate data, 
summarize the data, and analyze current unit structure with respect to population 
supporting USAR unit fill rates in their current markets. We will establish whether 
current locations can support certain MOSs. We will accomplish this through regression 


analysis by modeling of the number of expected contracts from each ZIP Code and the 
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expected number of contracts by MOSs each ZIP Code can support. The response 
variable will be contracts. The predictor variables will be the BLS vocational inclination 
data groups (11), MA population (1), Microvision 50 (MV50) Lifestyle segmentation 
categorized by groups (11), quality assessment via ASVAB scoring (10), quality 
assessment via Armed Forces Qualification Test (AFQT) (1), and unemployment rate (1) 


for each ZIP Code. 


We will focus on the efforts of the USARC and USAREC to accomplish their 
annual USAR enlisted accession mission. Specifically, we address the current TPU 
vacancy problem and the unit positioning or stationing problems. We will examine and 
understand some of the basic concepts associated with identifying the problem, arriving 
at a feasible solution, and communicating this information to the appropriate decision 


maker for action. 


USARC’s Force Structure analytical personnel are the audience for this thesis. 
Structure positioning with respect to market is one of the keys to success in filling unit 


vacancies. The right type of unit needs to be in the right market. 


We will determine and recommend to the Chief, Army Reserve (CAR) a more 
appropriate distribution of ZIP Codes to RCs so the current and projected markets can 


support the TPUs at their respective locations. 


D. COURSE OF STUDY 


We use regression techniques to maximize the fill rate of USAR units. This 
regression uses predictor variables including BLS vocational aptitudes of US population, 
MA population, ASVAB Lines Scores, AFQT Scores, and MV50 segmentation 
information to gain insight to better unit stationing. We also seek to uncover better 
practices in stationing actions for USAR units. We would like to answer the following 


questions: 


l. Is there a methodology that enables the USAR to better station units with 


respect to the population demographics? 
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2 Is there a significant correlation between unit fill and the vocational 


propensity of the market or ZIP code of interest? 


3. Is there a significant correlation of local market competition factors such 
as job market unemployment rates, sister service human resource competition, and USAR 


ability to fill units in these areas? 


4. Does the market have sufficient population to meet structure or quality 


requirements necessary for a particular unit? 


>, What insights arise from analysis of the top or most prominent vocations 


in each market? 
6. What are the policy implications for the Chief, Army Reserve (CAR)? 


Ti What is the effect of relaxing or tightening the commuting constraint? 


We explore and evaluate unit positioning with respect to geo-demographic 
considerations of respective recruiting markets. We identify and subsequently ignore 
those political encumbrances with respect to historical placement of some reserve units 
and the constituent population. Historical accessioning information and other relevant 


data determines the unit fill rate. 


We restrict modeling efforts to those methods involving linear transformations, 
regression applications, forecasting, and optimization techniques that give insight to 
significant relationships of unit positioning in a geo-demographic market. We will 
describe the equation of the “top” five MOSs with respect to the variables of interest. 
The collection of information must be at Zip Code level of detail to create a model to 
distribute this information to an RC. There is a multitude of information needed to 
determine the suitability of the MOS in the market. Major data elements used in the 


analysis include: 
a. US Postal Service ZIP Code Master File 
b. Bureau of Labor and Statistics (BLS) Vocational Master File 


c. Fill Rates of USAR units by ZIP Code or market 
20 


Force Structure File 

Local Area Unemployment Master Data File 
FIP Code Master Data File 

MOS Quality (QUALS) Master Data File 
Sister Service (Reserve) Accessioning Data 


All Army Accessioning Data 


21 


THIS PAGE INTENTIONALLY LEFT BLANK 


22 


Hl. DATA AND METHODOLOGY 


A. DATA SOURCES 

Data is essential for analysis. Although obtaining a data set sounds simple, 
putting the data into a useful format and ensuring it is free from obvious errors was the 
most complicated part of this analytical process. Placing this data into a useful form is an 
art as well as a science. All acquired data in this thesis was obtained without monetary 
expenditure on the part of the analyst. This in itself is a major feat. Appendix A (Table 
Definitions Dictionary) contains the obtained data on unit stationing, population statistics, 
force structure files, MA population, and vocational aptitudes of the entire US market by 
ZIP code or Federal Information Partnership (FIP) code. The FIP code is the state and 
county origin of the data sampling. Appendix A describes: 


a. US Postal Service ZIP Code Master File (http/zip4.usps.com/ 
zip4/zip_responseA.jsp); 


b. USAR Force Structure File (FRC_ FILE); 

c. USAREC Military Available Population Data (PM03); 

d. Microvision 50 Lifestyle Segmentation Data (MV50); 

e. All Army Accession Data (ALLARMY); 

f. Sister Service Accession Data (SISSERV); 

g. Qualifications Data (QUALS); 

h. BLS Vocational Master File (P050); 

1. BLS/USBC Local Area Unemployment Data — County (LAUCNTY); 


j.. BLS/USBC General Population Employment Data 
(gp.data.1.AllData); 


k. BLS/USBC General Population State Code Data (gp.state). 
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B. DATA COLLECTION 

Appendix B (Table Data Fields and Descriptions) contains information on the 
tables used in the analysis. There were a number of sources used to obtain data. The 
actual data collection and preparation consumed more than two months of effort. The 
author was involved in the initial data-warehousing project at USAREC. This enabled 
faster data acquisition of the MA population, contract and accession, sister service, and 
segmentation information used in the analysis. Data warehousing greatly assists in 
reducing the amount of time required to obtain data elements for analysis. Data elements 
required about two weeks to acquire once the query for the data was formulated. Query 
formulation took approximately three days to accomplish. Without the data-warehousing 


capability, this data collection would have taken over two months to accomplish. 


While waiting on these elements, we had to find the vocational information and 
obtain access to this information by ZIP Code. The author’s spouse is a Field 
Representative for the United States Bureau of the Census (USBC), and helped. This 
data was obtained by tracking the information back through the Current Population 
Survey (CPS). These elements took was approximately 3.5 weeks to collect. Once 
obtained, it had to be manipulated from its source into a workable format for integration 
into final tabular form taking another three days or so. Total time invested was 


approximately one month. 


Two other hard-to-acquire data sets are the Local Area Unemployment (LAU) 
county (employment and unemployment) data and the United States Postal Service 
(USPS) ZIP Code information. The unemployment data is collected and summarized by 
FIP Code, not by ZIP Code. Once located, this table was copied from the BLS website in 
text clipping format, as no file transfer protocol (FTP) site was available. Once clipped 
in text form, we had to find and acquire a way to break the data into useful pieces of 
information, using a dictionary. We obtained one from the BLS. Once obtained, we used 
the data dictionary to segment the data into its useful pieces. There are over 2600 
counties in CONUS. A great amount of effort was put into to locating a ZIP Code to FIP 
Code table. 
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The author recalled a five-year-old table having exactly what was needed. 
However, since the Postal Service changes ZIP Codes frequently, the data had to be 
checked and scrubbed for accuracy. The Postal Service currently has over 33,000 listed 
ZIP Codes. This includes all US possessions and territories. Since our concern was 
CONUS, this narrowed our scrub to approximately 30,000 ZIP Codes to verify. Since 
the USPS changes ZIP Codes frequently, the only manageable way to accomplish the ZIP 


Code verification was to conduct these verifications on-line through the USPS website. 


The initial scrub confirmed over 27,000 ZIP Codes leaving about 3,000 to check 
and verify by hand. This was a tedious task to accomplish. This process took 
approximately 3 minutes per ZIP Code, working on-line through the USPSs website. The 
complete task took 150 hours. If we had been able to purchase current the ZIP Code 


Master File, we might have been able to cut this task duration time 1n half. 


Once the second scrub was complete, we had to resolve by hand over 700 ZIP 
Codes that were not available on the USPS website. However, I considered them critical 
for the analysis because the number of contracts produced by these ZIP Codes was 
greater than 5 per year. If not considered, we could have lost approximately 4,000 annual 
NPS contracts, out of an average annual USAR accession mission of 20,000 NPS. This 


process took about 3.5 weeks. 


Once we accomplished all these collection tasks, approximately two months had 
lapsed. As the data arrived, it was necessary to review and become familiar with it. 
Some data arrived without data dictionaries or other helpful items to understand the 
tabular contents. Once received, I noticed that some informational items requested did 
not arrive in a proper format or were not included in the data sent by the provider. Calls 
and e-mails were made to verify data elements and items not included, taking over two 


weeks to accomplish. 


Some peculiarities found in the data were: no labor force information for some 
ZIP Codes, no annual production for some ZIP Codes (result of changing ZIP Code data), 
incorrectly coded information, non-existent ZIP Codes, incorrectly classified lifestyle 
segmented data, etc. These were addressed to in the development of the final data table 


containing the ZIP Coded assemble information. 
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Information arrived in varying formats. Formats included varying text file 
formats, spreadsheet files, data files, mainframe files, and varying database formats. All 
the collected information had to be finalized into one table containing all pertinent items 


with respect to each ZIP Code. The following sources were used in this analysis. 


1. United States Army Recruiting Command (USAREC) 

USAREC provided data on USAR accessions, listing each applicant’s Military 
Examination and Entrance Processing Station (MEPS) testing data, demographic data, 
and market segmentation information. USAREC also has in its repertoire of data the 
useful MA population (PM03) derived from commercial source, Woods and Poole. This 
data was obtained with the assistance of MAJ Michael Kamei and Mr Rodderick Lunger, 
Programs Analysis & Evaluation Directorate, Headquarters, USAREC, Fort Knox, KY. 


The market segments were obtained from a commercial source as well. The 
clustered data, ZIP+4, were derived from MV50 segmentation data. This data contains 
50 market segments characterizing demographics, purchasing habits, etc. This data, 
along with the Army’s accessions data, spans from FY99 through end of FY0O3. 
USAREC also provided Sister Service data for the same time period. This data was 
obtained with the assistance of Mr Rodderick Lunger at (800) 223-3735 (x60358), 
Programs Analysis & Evaluation Directorate, Headquarters, USAREC, Fort Knox, KY. 


Zs United States Bureau of the Census (USBC) 

USBC provided data on the vocational aptitudes of the entire working population 
listing each ZIP code’s actual vocational inclination using the P050 Tables from the 
USBC. We used the Current Population Survey (CPS) data to check the counts of the 
population and unemployment, and to cross verify the Military Available (MA) 
population from USAREC data. This data includes the 2000 Census and updates from 
the Current Population Survey (CPS) data for FY2002. This data was obtained with the 
assistance of Mrs Susan Fair, Field Representative, USBC; Mrs June Grillo, Senior Field 
Representative, USBC; and Mr Jamey Christy at (818) 904-6393, Regional Director, US 
Bureau of the Census, Los Angeles, CA. 
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3. United States Postal Sevice (USPS) 
The analysis used the Master ZIP Code information from the USPS’s website, 
http://www.usps.com/zip4/citytown.him. The MA population from USAREC PMO03 table 


was cross-verified using the USPS website. 


4. United States Bureau of Labor and Statistics (BLS) 

The BLS website, www.bls.gov, provided data on the employment statistics of the 
entire working population, by each FIP Code, listing the actual employment information 
using the General Population (GP) Tables by county and state from the BLS. The 
Current Population Survey (CPS) data was used to check the counts of the population, 
unemployment, etc. It was also cross-verified using the Master ZIP Code information 
from the USPS’s website. The data obtained from the USPS and BLS websites was in 
text clipping format. It was imported and manipulated using Microsoft FoxPro software 


into tabular database form for use in this analysis. 


5. Office of the Chief of the Army Reserve (OCAR) 

Additional data pertaining to USAR force structure (Force File), recruiting and 
accessioning priorities, fill priority, and USAR data descriptions were provided by Major 
Ward Litzenberg at (703) 601-3527, Programs Analysis & Evaluation Directorate at 
Office of the Chief of the Army Reserve (OCAR), Arlington, VA. 


c. DATA PREPARATION 

The data preparation took approximately 2.5 weeks to accomplish. Much 
manipulation, formulation, etc. had to be accomplished to get all the data elements into a 
common, useful, and usable format for integration. Several software packages 
accomplished the data preparation aspect of the analysis. The software used to organize, 
classify, assemble, derive, aggregate, and analyze the data was: Microsoft FoxPro 2.5 
(MAC OS), Microsoft Visual FoxPro 6.0, Apple’s Text Edit (MAC OS), Microsoft Word 
Pad, Microsoft Excel (MAC OS), Minitab 10.0 (MAC OS), S-Plus 6.1, and SPSS 


Clementine 8.0, a data mining software application. Microsoft FoxPro and Clementine 


Zi 


8.0 produced the classification and integration of the data. FoxPro manipulated most of 
the data tables into a usable format. Once we created the usable format, we used 


Clementine to graphically demonstrate the data “flow”. 


Clementine is a data mining application presenting visual representations of data 
and their elements. It permits limited statistical and accounting operations. It visually 
allows the user to demonstrate and select data preparation or certain “mining” of data and 
its elements to filtering. Data “streams” are groupings of different graphical operations 


from source to sink. 


These operations allow the user to demonstrate certain properties of the data. 
Operations performed by Clementine are: selecting, sorting, setting, appending, filtering, 
making distinctions, merging, filling, creating, deriving, and collection operations. Input 
nodes are circles, output nodes are boxes, operations nodes are hexagons (on fields and 
records), modeling nodes are pentagons, graph nodes are triangles, and supernodes are 


stars. A user can choose to place the most frequently used nodes in a “Favorites” palette. 


Figure 3.1 demonstrates node classification in Clementine. The nodes are 
graphically and statistically linked in the editor window. One of the first items to 
consider was to place the data into a usable format. Clementine and FoxPro enabled the 
data elements to be selected, sorted, assembled, and scrutinized. The figure shows the 
node types and varieties. There are source, record ops, field ops, graphs, modeling, and 
output nodes available for use. These classifications permit the performance of a myriad 


of operations for data manipulation, computations, modeling, and statistics. 


The analysis requires RC and ZIP Code level of detail. ZIP code level data 
formed the basis for the collection and arrangement of data elements to facilitate the 
analysis. The JOBMV50 table contains data from tables assembled by ZIP code. All 
tables containing ZIP code information were verified using the US Postal Service ZIP 
Code Master File located at hitp:/Avww.usps.com/zip4/citytown.htm. The USPS web site 
verified over 33,000 and re-verified over 750 ZIP codes obtained from the various data 


SOULCCS. 
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Figure 3.1: Clementine Example Nodes 


In Figure 3.2, we combined information, through data manipulation, of the 
JOBMVS0NEW with MAPOPLAU to create JOBMVPOP. JOBMVPOP contains BLS 
vocational, MV50 Lifestyle Segmentation, MA population, and LAU information in one 
table. Through programming and data manipulation, FoxPro created JOBMVS50NEW and 
MAPOLAU. JOBMVS50NEW is combination of BLS vocational and MV50 Lifestyle 
Segmentation information. MAPOLAU 1s the combination of the MA population and 
LAU information. 


Figure 3.2 demonstrates the results of data mining using Clementine 8.0 software. 
It shows the kinds of operations used to facilitate data manipulation. The details for the 
figure are as follows. The INPUT nodes (circular symbols), JOBMV50NEW and 
MAPOLAU, are on the left of the graphic. The next nodes (hexagonal symbols), reading 
left to right, are the TYPE nodes. These nodes confirm the type of data arriving and 
departing the TYPE nodes. The next two hexagonal nodes are called FILTER and 
SELECT nodes. They perform the record functions on the data flowing through them. 
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The other (rectangular nodes) nodes are OUTPUT nodes. These nodes are terminal type 


nodes. Data flows only into these nodes. 
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Figure 3.2: Data Mining Using Clementine 8.0 Software 


Other nodes depicted in the graphic are SUPERNODES, OUTPUT nodes, and 
DERIVE nodes. The star nodes are SUPERNODES. They group an informational 
stream of nodes combining their functions into a single node. Most of the time a 
supernode use is to denote multiple functions of similar type. It 1s also used to clean up 
the graphical flow of data manipulation into one function denoted by the SUPERNODE. 
The STATISTIC node use is for obtaining certain statistical information about the 
stream. You can collect information about the stream of data by inserting one of these 
OUTPUT type nodes. As previously stated, these nodes are terminal nodes. Data only 
flows into these nodes. The information from the node cannot be used for input into any 
other stream. The last node depicted in the figure is the DERIVE node. Just as the name 


of the node suggests, it derives a field or multiple fields from other fields in the stream. 


As demonstrated, Clementine is a powerful piece of software which makes data 
mining very simple and easy to understand. The data flows along the connectors 
(arrows), called streams. Streams are easily constructed and manipulated. The data 


flows along the stream paths, from source to terminal nodes, performing operations on 
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the data resulting in useful information. Looking at the input data and deriving a useful 
table is beneficial to the analysis in both time and programming effort. Figure 3.2 shows 
the derivation of the MA population and MV50 segmentation data into the JOBMVPOP 
table. The table is a collection and assembly of data at ZIP code level. 


The analysis incorporates the PM03 MA population data. FoxPro 2.5 (MAC OS) 
and Visual FoxPro 3.0 Relational DataBase Management Systems (RDBMS) were used 
to bring the information to a useful format. PM03 determines the MA population for 


each ZIP code. We derive the MAPOP from the PM03 table using FoxPro. 


Figure 3.2 contains data from the varying sources summarized in _ the 
JOBMVS5S0NEW (update from JOBMV50) table. The two tables providing principal 
source of information are: P050 and MV50 tables. The resulting table, JOBMVS0NEW, 
is deemed JOB, from the P050 table, and MV50, from the MV50 segmentation data. 
Also incorporated in the JOBMV5S0NEW table is the LAUCNTY table data. This 
information is the Local Area Unemployment (LAU) data by county for 2002. BLS and 
the CPS verified this information in 2003. It has the labor force, employed, unemployed, 
and unemployed rate figures by FIP code. 


One additional table supporting the JOBMV50NEW table is the gp.data.1.AllData 
table. This table provides the General Population (GP) employment information by FIP 
code. This table has the average annual historical unemployment rates from 1981 - 1998. 
It differs from the LAUCNTY table, in containing simply the unemployment rate figures 


for each FIP code along with comments on data specifics. 


Obtaining ZIP code detail about our data and population is key to the analysis. 
Unit authorizations, by MOS, are the basis of the analysis. The USAR Frc _ File 
identifies unit authorizations and on-hand totals for all MOSs. Using Clementine, we can 
choose to include or exclude certain aspects of the data. In establishing the USAR 
Frce_File information, the scope of this analysis excludes the officer and senior enlisted 


force structure. This is done by the use of select nodes in Clementine. 


One item needed for the analysis is the target_mos_rc_;;. Once we obtain all the 
demographic information by ZIP code, we can begin other required assembly of the data. 


The first needed item is Army contract data. We want to determine how many contracts 
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we obtained from each ZIP code to determine penetration rates of the market. 
Remember, market is the collection of ZIP codes surrounding the RC within 75 miles. 
The RCMKT75 table is the origin of this information. The author created this particular 
table from RC ZIP Codes. We can determine the units needing personnel fill from the 
USAR Frc _ File table. This table has the USAR force structure composition for each 
unit. For this analysis we will use an extract of the information in the Frc_ File table 


called USARTOT. 


The extract contains the enlisted population, specifically, the skill level 1 and 2 
force structure. Our focus is the problematic junior enlisted. Since we have the force 
structure, we know each MOS required at each RC. If needed, the model can later 
incorporate all the force structure. Armed with this information, we can use the QUALS 
table to ensure the population scores, on the ASVAB, are sufficiently high enough to 


qualify for the force structure at its current location. 


For example, Figure 3.3 demonstrates the use of Clementine to merge the 
information contained in the QUAL and USARTOT tables. During the execution of the 
MOS Quality Check table, Clementine displays the use of information by turning the 
input tables purple and the lines linking the data elements green. This shows the 
graphical representation of the flow of data and the operations performed on the data at 
each node. Appendix E (Clementine Screen Snapshots) contains details of all 


constructed streams of data collected, assembled, purged, and extracted. 


Here is a summary of the data inputs and derivations. ALLARMY2 created 
ALLARMYCLEAN and AIIARMY MOSQualify. ALLARMYCLEAN has all “duplicate”, 
“no ZIP code”, and “no AFQT” records stripped from the original data source, 
ALLARMY2. All[ARMY MOSQualify is the result of checking the LSCAT against each 
MOS in the inventory to see if the accession qualified for the MOS. If they qualified for 
the MOS, we increased the tally for the MOS for the particular ZIP code. The resulting 
table contains the MOS total qualified for the ZIP code. 
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Figure 3.3: Using Clementine to Conduct a Records Merge 


We transformed and manipulated AI/ARMY MOSQualify to derive the necessary 
information for the analysis. Dr. Samuel Buttrey, Naval Post-Graduate School, using S- 
Plus code, performed the manipulation of the data to create the tallies for the MOS. We 
did not carry the column headings for each MOS as they were created since they are in 
numerical order. After the tallies are complete, we had to place the data back into 
columnar arrangement to complete the summary of the MOS by ZIP code. C code, 
programmed by Dr. Samuel H. Buttrey, completed the transformation of 
ALIARMY MOSQualify. The author completed the assembly using S-Plus, MS Word 
Pad, and Clementine text OUTPUT nodes. We constructed, derived, and assembled the 
ARMYbyMOSbyZIP table using Clementine streams by merging ALLARMYCLEAN and 
ALIARMY MOSQualify. 


To place the tables into a useful format required the merging of the four 


individual tables into one. Again, Appendix E contains the details of the merge. We 
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merged JOBMVPOP, ARMYbyZIP, ARMYbyMOSbyZIP, and SISERVAFQT. We 
previously discussed the details of JOBMVPOP. ARMYbyZIP contains Army accession 
data by LSCAT and AFQT for each ZIP code. We previously covered the details of 
ARMYbyMOSbyZIP. Lastly, SISERVAFQT has the same information as ARMYbyZIP, 
except SJSERVAFOT does not have the LSCAT for the Sister Service data. Sister 
Service data contains data for Marine Corps, Navy, Air Force, and Coast Guard Reserve 


Components. 


Table 3.1 shows a summary of the tabular information associated with the data 
derivations and manipulations. It contains the file name, number of fields in the file, and 
the record count for the tables. For example JOBMVPOP has 32,873 records and 32 
fields: 12 vocational, 12 segmentation, 8 population, and 1 ZIP code fields. 
STISERVAFOQOT has 30,751 records and 29 fields: 9 AFQT, 19 test score category, and | 
ZIP code fields. ARMYbyZIP has 33,178 records and 66 fields: 12 vocational, 15 AFQT, 
30 LSCAT, 8 test score category, and 1 ZIP code fields. Lastly, ARMYbyMOSbyZIP has 
33,124 records and 266 fields: 264 MOS qualifications, 1 count, and 1 ZIP code fields. 
When merged, these four tables combine into the ALLDATAbyZIP yielding the final table 
for the analysis. This table contains 29,865 records and 392 fields. 


FILE FIELDS RECORD CNT 
JOBMVPOP 32873 


ARMYbyZIP 33178 
ARMYbyMOSbyZIP 33124 
ALLDATAbyZIP 29865 


NOTE: The Final ALLDATAbyZIP table is an inner join table containing fewer records than the tables joined (even the 
minimum number of records — 30,751). I omitted some records with discrepancies and the inner join deleted incomplete ZIP Code 
information. Thus the Final ALLDATAbyZIP table contains 29,865 complete records. 


SISERVAFQT 30751 





Table 3.1: Clementine File Creation and Table Derivation Data 


This final table, ALLDATAbyZIP, represents almost three months of data 
requesting, collecting, manipulating, assembling, etc. The latter parts, manipulating and 


assembly would have taken at least three times longer using software languages already 
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known and understood to the author. The learning curve associated with using and 


understanding Clementine was about 2-3 weeks. 


Clementine greatly assisted in the development of this analysis. The amount of 
time devoted to getting the data into a usable format is approximately the same as using 
other software programming languages. However, Clementine is a graphical visual tool 
allowing a multitude of input formats whereas data formulation and manipulation must be 
in certain formats to work with database or SQL programming languages. The advantage 
is these streams of information are already constructed; the data updating can be an 
automated process without the additional labor and worry of formatting using other 


software languages. 


Appendix E contains the detailed streams constructed in Clementine. The screen 
snapshots are clearly visible and understood by giving attention to the data streams and 
the node operations performed on the data. Now that we have seen how to put the data 


into a useful format, the next chapter develops the analysis. 
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IV. THE ANALYSIS 


A. POSITIONING UNITS TO MAXIMIZE FILL RATES 


The study reveals there are several items of interest with respect to the unit 
positioning and quality assessment of the markets. Each ZIP code represents a unique 


contribution to the overall needs of the United States Army. 


Our original thoughts were to create data elements for a time series forecast and 
analysis. We may be able to create a more effective model by using and applying time 
series forecasting methods. We could accomplish the collection effort; however, our 
current model has nearly 30,000 ZIP Codes and 432 predictor variables. With six years 
of information (FY 1998-FY2003) times 12 months per year, 72 times more information 
would need to be collected. Therefore, our resulting data table would be approximately 


30,000 by 30,000. 


If we were to use monthly time series, our data collection efforts would increase 
72-fold making the analysis nearly impossible on a stand-alone PC. The computational 
effort needed might increase 72-fold or more depending on the processor. Current data 
streams constructed in Clementine take nearly 42 minutes to run on a 2.80 Mhz Pentium 
IV processor with | Gb of RAM, 60 Gb hard drive, and a LAN access server of over 300 
Gb. 


We decided to use the 30,000 by 432 table for our contract and MOS regression 
equations. As explained in chapter 3, understanding the data elements and their relation 


to the analysis is key. To demonstrate the analysis, we will walk through fitting a model. 


I created a table associating MOS to BLS vocations. This information will assist 
in determining whether the market has a sufficient quantity of this particular vocation to 
Support our force structure. For example, why not locate an engineer construction 
Support company where the prominent vocations of the area are machine operators, 


craftsman, and laborers? 


Using this consideration, we have a rationale to determine force structure 
placement with respect to the market. Appendix C (Occupations and Working Class 
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Categories) contains the categorical occupations across the US. This tabular information, 
from BLS and USBC, contains the most prominent vocations by ZIP code. We develop 
regression equations for each MOS using this information as predictor variables. We 
begin to understand why we have a problem. Misalignment of the vocations of the area 


with the force structure can contribute to poor unit fill. 


The next data item used for analysis 1s the LSCAT information obtained from all 
Army contracts from FY1999 — FY2003. This information contains the LSCAT scores 
for each ZIP Code. LSCAT gives information about the quality of the accession. 


Without it we do not know if we can support the specific jobs 1n the unit force structure. 


Once we have found the MOS regression equations, we can determine which 
units can be supported by a unit’s particular ZIP code. Knowing this information will 
greatly assist in the constraint set development for the optimization distribution model in 


Phase III. This will assist in completing the MOS regression equations for Phase II. 


B. DATA FAMILIARIZATION 


We need to determine the appropriate predictor variables for each model. It is 
reasonable to assume that population, vocations, lifestyle segmentation, LSCATs, etc. are 
market influencers. The first question is: How many contracts can I expect to obtain 
from each ZIP Code? A cursory evaluation of data yields a correlation (0.7737) of the 
MA population and the number of contracts in the ZIP Code. This is reasonable since 


contracts should increase as the population increases. 


We next examine the data graphically. Figure 4.1 demonstrates the lifestyle 
segment group percentages for the USAR contracts and the population. There are 11 of 
these groups plus one segment with incorrectly grouped individuals (MVS50GP00&99). 
This segment grouping was the result of misclassified contracts. The figure shows that 


some segments are recruited or join proportionally more than other segments. 
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USAR Contract Percentage & Population Percentage Distribu 
by Lifestyle Segment Group 
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Figure 4.1: _ USAR Contract & Population Percentage Distribution by Lifestyle 
Segment Group 


Note the distribution of contracts and the population. The distribution of the 
MV50 Lifestyle Segment Groups for USAR contracts 1s similar to that of the population, 
at least in the top 70% of the segment groupings. Segment Group 2 for the USAR 1s 
41.35% compared to 41.29% for the population. Segment Group 4 for the USAR is 
16.34% compared to 19.78% for the population. Finally, Segment Group 8 for the USAR 
is 12.72% compared to 8.41% for the population. 


Lifestyle Segment Groups 2, 4, and 8 represent over 70% of the USAR contracts. 
The distribution of the MV50 Lifestyle Segment Groups for the USAR is similar to the 
remainder of the Army. It appears as though the USAR contracts a large number of 
personnel from these three segment groups. Therefore, we expect that these segment 


groups will be represented in the final regression. 


Appendix D (Microvision 50 Lifestyle Segments) contains the segment 
groupings. Segment Group 2 consists of Segments 10, 11, 16, 17, 18, 22, 35, and 38. 
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This grouping is composed of families. Segment Group 4 consists of Segments 8, 12, 15, 
32, 34, 39, and 40. This grouping is composed of people who are single. Segment Group 
8 consists of Segments 24, 42, 43, 44, and 46. This grouping is composed of families as 
well. A Chi-Square test for the difference of equal proportions shows statistically 
significant differences. However, when you look at their distributions, they do not differ 


by much. 


It is a reasonable expectation that the vocational composition differs at ZIP Coded 
level. There may be some kind of grouping that aggregation would show some 


similarities, at least in the majority or major categories. We explore the data by: 
1. Grouping data by FIP Code (over 2600); 
2. Grouping data by Metropolitan Statistical Areas (MSA) (over 1300); 
3. Grouping data by State (49 - CONUS); 
4. Grouping data by ASG (over 20); 


5. Grouping data by RSC (10). 


We decided to look at a summary categorization by RSC. There are 10 CONUS 
RSCs and we also had data on the 9" ARCOM. We conducted a Chi-Square test for 
similarities in the RSCs’ and the ARCOM’s vocations and lifestyle segmentation. The 
results indicate the RSCs differ in segments and vocations. We performed the Chi- 
Square test for similarities on the population raw data. Tables 4.1 and 4.2 are shown in 
percentages for display only. One would not be able to see the difference with the raw 
data, so we demonstrate the difference by using the percentages. The actual Chi-Square 


value for the raw data is located at the bottom of the table. 


The tables indicate they are very different in the percent of several vocations and 
lifestyle segments. The vocational table shows those differences. Some are strikingly 


different such as FAFOFISH and TRANSPO vocations. 
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EXEC a ADMIN PROF TECH’ SVC SVC SALES CRFTS’¥ LABOR = B 
RSC MNGE SPT SNL SPT OTHR- PROT MAN ERS 
QAR 22.562% 0.787% 7.850%  14.387% 3.003% 15.666% 2.115% 15.910% 8.383% 0.231% 9.107% 
63rd 24.593% 0.557% 7.807%  15.421% 2.823% 11.562% 1.596% 14.977% 8.668% 0.267% 11.729% 
70th 23.704% 0.965% 7.320%  15.479% 3.132% 11.167% 1.259% 14.442% 9.284% 0.277% 12.971% 
90th 22.255% 0.573% 7.507%  14.450% 3.337% 10.730% 1.611% 15.006% 10.806% 0.352% 13.374% 
96th 24.214% 0.982% 7.664%  15.036% 2.992% 10.754% 1.243% 15.040% 10.404% 0.356% 11.315% 
89th 22.339% 1.185% 7.557% 13.911% 3.509% 10.729% 1.183% 14.639% 9.330% 0.249% 15.371% 
81st 21.615% 0.484% 7.379%  13.505% 3.433% 10.646% 1.563% 15.146% 10.581% 0.324% 15.324% 
88th 22.396% 0.469% 7.652%  14.419% 3.399% 10.342% 1.311% 14.497% 8.588% 0.212%  16.715% 
99th 24.356% 0.315% 7.886%  16.211% 3.471% 10.264% 1.564% 14.131% 8.755% 0.263% 12.782% 
77th 25.036% 0.175% 8.153%  16.551% 3.620% 10.965% 2.040% 14.803% 7.281% 0.213% 11.163% 
94th 25.959% 0.237% 7.708% 17.392% 3.602% 10.125% 1.422% 14.121% 7.801% 0.240% 11.393% 


NOTE: Chi Square Test for similarities conducted on Raw Data, not the Percentages 
Pearson's chi-square test without Yates' continuity correction: X-square = 2745863, df = 100, p-value = 0 


Table 4.1: 


[The percentage of population vocations for each RSC. 


demonstrates the difference in vocational composition of each RSC.]| 


Chi Square Testing of Vocational Aspects of RSCs 





This table 


RSC MVGP01 MVGP02 MVGP03 MVGP04 MVGP05 MVGP06 MVGP07 MVGP08 MVGP09 MVGP10 MVGP11 
9AR 37.134% 20.527% 6.874% 9.091% 1.319% 1.141% 0.525% 1.013% 22.155% 0.218% 0.004% 
63rd 22.663% 28.143% 3.519% 25.740% 0.544% 4.677% 0.255% 9.200% 5.072% 0.092% 0.096% 
70th 11.018% 49.577% 4.614% 21.450% 0.680% 6.755% 0.343% 1.564% 3.859% 0.076% 0.063% 
90th 9.849% 38.607% 8.702% 19.792% 1.304% 3.677% 0.625% 13.917% 3.341% 0.147% 0.037% 
96th 13.704% 48.161% 5.307% 20.734% 0.942% 4.604% 0.804% 2.027% 3.604% 0.090% 0.022% 
89th 7.191% 54.900% 7.660% 15.911% 1.182% 5.945% 1.028% 3.544% 2.508% 0.119% 0.011% 
81st 7.219% 42.573% 8.875% 18.346% 1.584% 6.200% 0.308% 12.434% 2.220% 0.193% 0.047% 
88th 9.795% 50.478% 4.963% 17.454% 0.715% 4.780% 0.421% 6.886% 4.390% 0.088% 0.031% 
99th 13.750% 45.112% 5.343% 17.415% 0.764% 4.562% 0.303% 7.460% 5.197% 0.091% 0.004% 
77th 14.285% 29.237% 3.068% 19.433% 0.598% 3.233% 0.242% 7.227% 22.525% 0.118% 0.033% 
94th 16.842% 37.993% 5.966% 25.841% 0.740% 4.739% 0.390% 2.802% 4.545% 0.093% 0.048% 


NOTE: Chi Square Test for similarities conducted on Raw Data, not the Percentages 
Pearson's chi-square test without Yates' continuity correction: X-square = 12788076, df = 100, p-value = 0 


Table 4.2: Chi Square Testing of Lifestyle Segmentation Grouping Aspects of 


RSCs 


|The percentage of the population lifestyle segment groupings for each RSC. 
This table demonstrates the difference in lifestyle segment grouping composition of 
each RSC.| 
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Similarly for the MV50 Segmentation information, the Chi Square Test reveals 
the segmentation distribution of the RSCs differs. The MV50 Segment Groups table also 
demonstrates those differences. Segment Groups MVS50GP01, MVS50GP02, MVS50GP04, 
MV50GP08, and MVS50GP09 are very different than other segments. Using percentages 
demonstrates the differences better than the raw data. There 1s one other noted feature of 


the data. 


Figure 4.2 captures the essence of the original segmentation information for the 
contract data. Recruiter segment misclassification rate 1s 4.29% (segment 0 [4.06%] and 
segment 99 [0.23%]). Of the MV50 Lifestyle Segments, nearly 50% of USAR contracts 


come from the top ten segments. By concentrating on these top ten segments, recruiters 
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Figure 4.2: MVS50 Lifestyle Segmentation Distribution of USAR Contract Data 
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can realize nearly half of the total contract effort for the USAR. This information could 
be incorporated into USAREC’s mission distribution model or recruiting policy. 
Knowing the composition of the recruiting market could greatly assist USAREC, 


USARC, and the recruiting force accomplish its annual accession mission for the USAR. 


Market composition, and the number of contracts obtained by each market, 1s a 
key component to understanding the recruiting environment. The more information 
acquired about the recruiting environment, the better we can make use of the personnel 
and monetary resources we have available. The additional information will enable us to 


formulate better predictive models to assist in the recruiting effort. 


Knowing the market and RSC composition should assist in the type of units 
placed in the RSC’s market. Predictive modeling will assist in unit stationing actions and 
prevent their poor placement in the market. The combination of these two pieces of 
information may greatly assist in future unit placement and stationing actions based on 


vocational, lifestyle segmentation, and unemployment aspects of RSC markets. 


C; MODEL FITTING — A LEARNED PROCESS 


Model fitting is a science and an art. After data familiarization, our intent was to 
treat all ZIP Codes equally. The way to achieve this was to place our tabular information 
into proportions so we could make comparisons with ZIP Code information. One bit of 
information necessary to review prior to starting our model fitting was to look at the 
unemployment rates of the country. How does the unemployment rate affect the outcome 


of contracts? 


Unemployment data will change over time. Times series model development may 
be able to capture the unemployment rate over time, but we notice that the number of 


contracts produced annually per ZIP Code is generally small. 


Figure 4.3, provided by the BLS, shows the national unemployment average for 
the period March 2003 through February 2004. Note that 5.9% is the national average. 
The map indicates there are counties in the US employing more than 94.1% of their 


population. Each county 1s clearly different. 


43 


The map demonstrates that the Midwest has the highest employment rates. This 
may be misleading since a greater portion of the Midwest land 1s used for farming. Since 
population density and number of jobs available are different than the rest of the country, 
this information may contain employment bias. This bias may be in the farming, 


forestry, and fishing vocation of the market. 


Unemployment rates by county, 
March 2003 - February 2004 averages 
(U.S. rate = 5.9 percent) 
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Figure 4.3: BLS Average Unemployment Rate by US County (Mar’03-Feb’04) 


Our original approach was to treat each ZIP Code equally. We developed a 
model using all our predictors. The expected proportion contracts from a ZIP Code 


should depend on the demographic composition of the market. 


In this case, we used the MA population (1), unemployment rate (1), vocational 
composition (11), and lifestyle segmentation composition (11) of the ZIP Code. This is a 
total of 24 (multiple regression) predictor variables to determine the outcome of the 
numbers of contracts a ZIP Code produces. 
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We tried four classes of models: Modeling the proportion of MA Population that 
enlisted, modeling the log (proportion of contracts), modeling total contracts as a Poisson 
random variable, and modeling total contracts as a Normal random variable. The 


preliminary model results are: 


¢ The model of proportions had low explanatory power — only an R-Squared of 


16%. 


¢ The log-normal model required discarding about 10% of the data having zero 


contracts. The resulting R-Squared was smaller — only a little over 12%. 


¢ The Poisson model explained about a little over 21% of the variation of the 


data. 


¢ The Normal model did better; and we fully developed it. 


D. MODEL FITTING — AVERAGE ANNUAL CONTRACTS 


Having described lifestyle segments and vocations, we can formulate and 
continue to evaluate our regression models. The next model evaluated is a simple linear 
regression model. The expected number of contracts from a ZIP Code should depend on 


the demographic composition of the market. 


In this case, we used the MA population (1), unemployment rate (1), vocational 
composition (11), and lifestyle segmentation composition (11) of the ZIP Code. This is a 
total of 24 (multiple regression) predictor variables to determine the outcome of the 
numbers of contracts a ZIP Code produces. The model we develop has a slope, an 
intercept value, and regression coefficients for each predictor variable. Recall that a 


multiple linear regression model has the following form: 


VSO EO ee De 


Equation 4.2: General Form of Multiple Linear Regression Model 
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In our case, we have j equal to 24. We express the expected number of contracts 
as a linear combination of the ZIP Code predictor variables. Keep in mind that there are 
29,865 ZIP Codes in our table. This information has the following linear model (LM) 
construct. The number of contracts is a linear function of (MA.POP, un.rate, 
EXECMNGE, FAFOFISH, ADMINSPT, PROFSNL, TECHSPT, SVCOTHR, 
SVCPROT, SALES, CRFTSMAN, LABORERS, TRANSPO, MVS50GP01, MV50GP02, 
MVS50GP03, MV50GP04, MV50GP05, MVS0GP06, MVS50GP07, MVS50GP08, 
MVS50GP09, MVS0GP10, MVSOGP11). Table 4.3 contains the detailed results from the 


regression. 


Not all variables in the regression appear to be significant. With this LM, we 
achieve a multiple R-Squared of 0.6934, compared with a 0.7737 correlation of MA 
population with contracts (i.e. MA population alone explains over 59% of the variation). 
About 10% is explained by demographics and vocations. The rest of the variation is 
likely to be due to policy (numbers of recruiters, station and recruiter placement, mission 


emphasis, goals, etc.). 


We see that SALES, TRANSPO, MVS0GP01, and MVS50GP11 appear to be 
insignificant in our table, as they all have p-values that exceed 0.05. This indicates that 
their respective coefficient values in the regression equation may be 0. We remove them 


from the regression and see that the R-Squared does not change much. 


We next look at the model’s coefficients. They tend to be small due to the scale 
of the predictors. We are predicting the average annual number of USAR contracts 
achieved in each ZIP Code. Does the order of magnitude make sense? The answer is 
yes. We have 29,865 ZIP Codes and a USAR NPS mission of 20,000. If each ZIP Code 


produces an average of one contract per year, we would have 29,865 contracts. 
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*** T near Model *** 


Call: iIm(formula = AR.Avg ~ MA.POP + un.rate + EXECMNGE + FAFOFISH 
+ ADMINGPT =- PROP SNI:. te TRCHSPY a SVCOLHR = SVCPROL =: SALES < “CRETSMAN: = 
LABORERS + TRANSPO + MV50GPO1l + MV50GPO02 + MV50GP03 + MV50GP04 + 
MV50GP05 + MV50GP06 + MV50GPO7 + MV50GP08 + MV50GP09 + MV50GP10 + 
MV50GP11, data = ALLDATAbyZIP2, na.action = na.exclude) 


Residuals: Min 10 Median 30 Max 
SR27 29) SO e2V06- =O.20592 3: 0. AAS 2335 


Coefficients Value Std.Error t-value Pr(>/|t 


(Intercept) Oe 1388 OretOeoal 92169 0.0 0-00 
MA.POP 00,0 OL 0.0000 18.5959 OO O.0 0 
un.rate -1.5446 Ou2260 =6;.8339 0.0000 
EXECMNGE -0.0002 Orn O0 13.5070 O0000 
FAFOFISH -0O.0006 0.0000 15 24750 0.0000 
ADMINSPT 20007 0.0000 ZS OCS 0.0000 
PROF SNL 0.0001 0.0000 4.7454 0.0000 
TECHSP st 0.2.0:00:8 0. 00.00 ZA OOS O.20000) 
SVCOTHR O0:0:OL 0.0000 Si oo: 0.0000 
SVCPROT O32 O:0.05 020.000 fZo545 0.20:000 
SALES OO 000 0.0000 0.3048 0.7606 
CRFTSMAN -0.0002 0.0000 =16.. 600d: 0.0000 
LABORERS -0.0021 0.0003 aes Rete leo re: 0.0000 
TRANSPO 00000 20000 Lea SOs 0.1468 
MV50GPO01 0.0000 0.0000 1.4467 0.1480 
MV50GP02 00.00 20000 S.2424 0.0000 
MV50GP03 OOO 0.0000 40.9519 0.0000 
MV50GP04 0.0000 0.0000 =6. 0032 0.0000 
MV50GP05 -0.0014 080.002 —-6.5840 0.0000 
MV50GP06 -0.0003 0:00:00 -14.9495 0.0000 
MV50GPO07 -0.0012 Or. OOI0S —-4,3682 0.0000 
MV50GP08 COU Ou 0.0000 14.9848 0.0000 
MV50GP09 -0.0001 0.0000 =9 2602 0.0000 
MV50GP10 -0.0014 0.0005 Sane nod, 0.0099 
MV50GP11 0.0001 Ois0 Gil: 0.6916 0.4892 


Residual standard error: 0.8929 on 29839 degrees of freedom 

Multiple R-Squared: 0.6934 

F-statistic: 2811 on 24 and 29839 degrees of freedom, the p-value is 0 
1 observations deleted due to missing values 


Table 4.3: S-Plus Linear Regression Model Formulation for Number of USAR 
Contracts 


Table 4.4 shows the results of removing variables, the resulting multiple R- 
Squared, and the regression df. One of the last predictor variables removed is MA.POP. 
We see that even removing MA.POP as a predictor does not change the amount of 
explained variation. We removed 11 predictor variables with very little change in the 
amount of explained variation in our LM. This suggests those variables are insignificant 
and do not contribute to the overall explanation of variation in the number of contracts a 


ZIP Code produces. A simpler model yielding the same R- Squared is usually preferred. 
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VARIABLE REMOVED MUTLIPLE R-SQUARED RGRSN DF 


SALES 0.6934 23 
MV50GP11 0.6934 ZZ 
TRANSPO 0.6933 21 
MV50GP01, MVS0GP 10 0.6933 19 
MV50GP05, MVS0GP06 0.6902 Ly 
MV50GP07, MVS50GP09 0.6877 15 
SVCOTHR, MA.POP 0.6821 13 


Table 4.4: Predictor Variable Removal and Multiple R-Squared Results 


After subsetting, we obtain the model in Table 4.5. Note that, as in Table 4.3, the 


coefficients of some of the predictor variables are negative. This indicates the number of 


Residuals: Min Le) Median 30 Max 
A029), SlesgZo S029. “One oLy aaso 


Coefficients Value Std.Error t-value Pr(>|t 


(Intercept) Orr Zoe O gO 150 8.4104 0.0000 
Unrate =—=1.8258 0.2269 —-8.0468 0.0000 
EXECMNGE -0.0002 00.0100 35a) O.000 0 
FAFOFISH -0Q.0004 0.0000 vl Ole Oh oe yo: 0000.0 
ADMINSPT 040009 0:0 0:00 44.8462 0.0000 
PROFSNL 04.00.02 0.0000 119365 0:00 OO 
TECHSPT Orc O00] 0... 0000 21.0479 0.0000 
SVCPROT OOS 0.0000 162 0.0000 
CRFTSMAN -0.0001 0.0000 SN OZ 50 O.20:0:00 
LABORERS. =0..0028 O200:03 Sl wOSZ2 0.0000 
MV50GP02 0.0000 0.0000 3.7429 0.0002 
MV50GP03 0.0073 020000 la ore 020 0:00 
MV50GP04 Oi O.0Od 0.0000 -4,8638 O-00.0 0 
MV50GP08 0002 0.0000 26/0054 0.0000 


Residual standard error: 5.454 on 29850 degrees of freedom 

Multiple R-Squared: 0.6821 

F-statistic: 4926 on 13 and 29850 degrees of freedom, the p-value is 0 
1 observations deleted due to missing values 


Table 4.5: S-Plus Linear Regression Model Formulation for Number of USAR 
Contracts (iteration 12) 


contracts a ZIP Code produces is negatively associated with the size of the variable in the 
ZIP Code. For example, we see that un.rate, EXECMNGE, FAFOFISH, LABORERS, 
and CRAFTSMAN all have negative coefficients. The larger the unemployment rate, or 


the greater the proportion in these vocations, the fewer contracts the ZIP Code can 
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produce. In particular, for every 10,000 LABORERS in the ZIP Code, the expected 


annual average number of USAR contracts decreases by 28. 


Table 4.5 demonstrates the resulting regression equation after 11 iterations of 
variable removal. The amount of explained variation is still greater than 68%. Since the 
“full” model had over 69% explained variation and the amount of explained variation is 
greater than 68% with 11 of our original variables removed, we use the simpler model. 
Determining the significance of predictor variables is a way to achieve a simpler more 


effective model. 


A good tool used to verify the model is to plot the data and look at its appearance. 
We can achieve this in two plots. The first 1s the actual data versus the “fitted” data. The 
fitted data is the predicted value or outcome using the regression equation. The second 1s 
the fitted data versus the residuals. The residuals are the deviation from the mean value 
of the regression. The mean value of the regression in a LM 1s the slope of the regression 


equation. 


Figure 4.4 shows the graph of the USAR actual average annual number of 
contracts and the USAR fitted average annual number of contracts. There are some 
values in the data that largely deviate from the regression model. These values are 
outliers. If you remove them from the regression and the slope of the regression line 
greatly changes, then they are large influencers. Normally, a determination needs to be 
made on outlier exclusion or inclusion. Since we have nearly 30,000 data points in our 


regression, we will disregard these outliers. 


The data should have a strong linear look to have a good LM fit. The graph 
appears generally linear. Notice the strong concentration of data points from 0 to 
approximately 6.5 annual contracts. This indicates that the predictions for the average 


annual number of USAR contracts should be fairly accurate in this region of the model. 


49 


ALLDATAbyZIP2$AR.Avg 





fitted(ARAvgvsALL) 


Figure 4.4: Graph of Army Reserve Average Annual Contracts versus Army 
Reserve Fitted Average Annual Contracts 


Figure 4.5 shows the graph of the USAR fitted average annual number of 
contracts and residuals. The points on the plot should be randomly scattered throughout 
the plot for the model to have proper fit to the data. This indicates model departures. We 
see that there is a linear relationship at the bottom left of our plot. Normally this 
indicates some kind of dependence in the data. The assumption is independent variables 


with homogeneous variance. 


Figure 4.5 would normally indicate heterogeneity of the variance, but we know 
this data. We tried and discarded a log model because the number of contracts for some 


ZIP Codes was zero. A log transformation would therefore not be appropriate here. We 
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might consider other transformations like log(n+1), where n is the number of contracts in 


the Zip Code, or the substitution of 0.001 for those ZIP Codes which produced zero 


contracts. 


residuals(ARAvgvsALL) 





fitted(ARAvgvsALL) 


Figure 4.5: Graph of Army Reserve Fitted Average Annual Contracts versus 
Residuals 


Figure 4.5 appears to indicate heteroscedasticity because the number of contracts 
is either zero or positive. Constant variance would plot the residuals scattered about the 
graph without pattern or shape. If it were not for this phenomenon, we would see the 


bottom left of the plot filled with data points as well. 


Let’s look at an example problem for a few ZIP Codes to see how our regression 
equation performs. Since we are here at the Naval Post-Graduate School in Monterey, 
CA, we will use ZIP Code 93940. Keep in mind we are using our smaller derived model. 


The unemployment rate for Monterey ZIP Code 93940 is 10.44%. Table 4.6 has the 
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remaining values for the predictor variables. The table construct 1s such that we can 
compute the dot product of the values for and the coefficients of the regression equation 


to produce the estimated number of contracts for the ZIP Code. 


MONTEREY: 93940 


Predictor Coefficient Values 
(Intercept) Oud 25S ue 
Une, cate =1e S256 0.1044 
EXECMNGE =0%'0 002 LZZOU 
PAPOPR-LSH —-0.0004 191 
ADMINSPT 0.0009 ZO98 
PROFSNL 0.0002 8756 
TECHSPT 0:00:07 1326 
SVCPROT Oe Q00S Slice 
CRE TSMAN =O, 00:01 2806 
LABORERS =00025 143 
MV50GP02 0.0000 1584 
MV50GP03 O02 154 
MV50GP04 -—0.0000 6347 
MV50GP08 0.0001 6 
PREDICTED: 1.47 


ACTUAL; 1.00 


Table 4.6: Annual USAR Contract Prediction Results for Monterey, CA 93940 


Looking at the historical information of the ZIP Code for Monterey we find the 
range of contracts is (0, 3). The six-year average for the ZIP Code is | contract per year. 
This is another reason not to do monthly time series analysis — we would have mostly 
zeros in your data. The annual predicted number of contracts is 1.47. The difference is 
0.47 contracts. The 95% confidence interval of the prediction is (1.41, 1.54) with a 
standard error of 0.03. We could obtain a confidence interval for our raw contract data, if 
we tested the values for normality and tested the residuals. Since we only have 6 data 
points, annual number of contracts, in our sample for each ZIP Code, this approach 


would be futile. This makes the regression worth the effort. 


Table 4.7 demonstrates the same information for one Salinas, CA Zip Code, 
93901 (note that some cities, like Salinas, have more than one ZIP Code associated with 
it). In the Salinas ZIP Code 93901 case, the annual predicted number of contracts is 1.68. 
The difference is 0.68 contracts. The 95% confidence interval of the prediction is (1.55, 


1.81) with a standard error of 0.07. 
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SALINAS: 93901 


Predictor Coefficient Values 
(Intercept) Ost256 1 
Unrate Slit. 0.1044 
EXECMNGE =0:.,00:02 6702 
PABOP LSE —-0.0004 L434 
ADMINSPT O. 00:09 2405 
PROFSNL 0002 2792 
TECHSPT 0.0007 1074 
SVCPROT 00003 1120 
CRE TSMAN =O 0 O01 3067 
LABORERS 0-002 6 TO 
MV50GP02 05:00 0:0 4159 
MV50GP03 00:0 43 269 
MV50GP04 =0:00:0.0 SL SZ 
MV50GP08 OO:0:0:2 476 
PREDICTED: 1.68 


ACTUAL; 1.00 


Table 4.7: Annual USAR Contract Prediction Results for Salinas, CA 93901 


SEASIDE: 93955 

Predictor Coefficient Values 
(Intercept) OL 2.5'5 il! 
Unwrate iO ae 0.1044 
BE XECMNGE =0°.,00'02 B2 74 
FAFOFISH —-0.0004 478 
ADMINSPT 0.0009 2280 
PROFSNL 0002 3455 
TECHS? & 0.0007 705 
SVCPROT 0:00:03 52.0 
CRE TSMAN =0, 0001 S059 
LABORERS =0:, 0026 63 
MV50GP02 0.0000 4353 
MV50GP03 0.0013 454 
MV50GP04 -—0.0000 io es 
MV50GP08 0:00:02 954 

PREDICTED: 2.15 


ACTUAL; 2.83 


Table 4.8: Annual USAR Contract Prediction Results for Seaside, CA 93955 


Likewise, looking at the historical information of the 93955 ZIP Code for Salinas 
we find the range of contracts is (O, 2) and the six-year average for the ZIP Code is 1 


contract. 
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Finally, we look at Seaside, CA. Table 4.8 demonstrates the information for the 
Seaside, CA Zip Code 93955. In the Seaside ZIP Code 93955 case, the annual predicted 
number of contracts is 2.15. The 95% confidence interval of the prediction is (2.11, 2.20) 
with a standard error of 0.02. The range of contracts is (2, 6) and the difference 1s 0.68 


contracts. 


These predicted values will become the max_recruits zip; for the eventual LP 
model. So we have the parameter for the maximum number of recruits obtained at ZIP 
Code i is the predicted value of the number of recruits obtained from ZIP Code i. The 


equation is as follows: 


aN 
max _recruits_zip; = max(0, AR.Avg;) 


Equation 4.3: Maximum Number of Recruits Formula 


There were about 120 negative predicted values, and we set them to zero in 


Equation 4.3. 


E. MODEL FITTING — TOP FIVE MOSs 


The next item brought out in this analysis is the maximum number of recruits at 
ZIP Code i of MOS j. This is the maximum of zero or the minimum of the predicted 
number of contracts of MOS 7 in ZIP Code i and the predicted number of recruits 
obtained at ZIP Code i. The formulation of the equation for this parameter in the 
eventual LP 1s: 


m—m—S. 


max_recruits_Zzip_mosi; =max(0, min(MOS, ? 


am—™N 
AR.Avg,) 


Equation 4.4: Maximum Number of Recruits by MOS Formula 


This keeps MOS predictions non-negative and within the total production. 


We now turn our attention to modeling the top five MOSs. This information 1s 


located in Appendix G (Top Five MOS Regression Equations). The current top five 
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MOSs are 52D, 74D, 77F, 88M, and 95B. We followed the same procedures for the 


MOS predictions as we did for the annual number of USAR contracts. 


However, since we do not know the importance of certain predictor variables in 
our models and since including insignificant variables will not change the outcome of the 
prediction; we will employ the full model for our top five MOSs. Recall that Phase II 
will construct all 264 MOSs in detail. Phase II will make the determination of the 


significance of predictor variables. 


The full model has the following LM construct. The actual number of contracts 


ALLDATAbyZIP2$q.52D.Avg.Annl 





fitted(Im.52D.VocSegFull) 


Figure 4.6: Graph of Average Annual USAR Contracts Qualifying for MOS 52D 
versus Fitted Average Annual USAR Contracts Qualifying for MOS 52D 


that qualified for MOS 7 in ZIP Code i, regardless of contracted MOS, 1s a linear 


function of (MA.POP, un.rate, EXECMNGE, FAFOFISH, ADMINSPT, PROFSNL, 
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TECHSPT, SVCOTHR, SVCPROT, SALES, CRFTSMAN, LABORERS, TRANSPO, 
MVS50GP01, MVS0GP02, MVS50GP03, MVS0GP04, MVS50GP05, MVS50GP06, 
MVS0GP07, MVS50GP08, MVS0GP09, MVS50GP10, MVS50GP11). Appendix G 


contains the detailed results from the regression. 


As with the predicted number of contracts, we ran diagnostic plots on actual 
versus fitted and fitted versus residuals. Figures 4.6 and 4.7 plots appear to be 


satisfactory. Notice Figure 4.7 has the same “shoulder” on the fitted versus residual plot. 


residuals(Im.52D.VocSegFull) 





0 2 4 6 
fitted(Im.52D.VocSegFull) 


Figure 4.7: Graph of Fitted Average Annual USAR Contracts Qualifying for 
MOS 32D versus Residuals 


Again this is because of the positive nature of contracts and those whom qualify for a 


particular MOS in a ZIP Code. 


The other top four MOS (74D, 77F, 88M, and 95B) plots, located in Appendix G, 


are very similar for both fitted versus actual and fitted versus residuals. As with our 
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predicted number of contracts model, let’s look at an example. To keep it simple, we will 


use the same ZIP Codes (93901, 93940, and 93955) as previously. 


How does our regression equation perform? Keep in mind we are using the full 
model because we will eventually want to examine MOS / in ZIP Code i. To accomplish 
the comparison, we need to examine the same model for each MOS. In Phase II each 


MOS will have its own model. We construct Table 4.9 in the same manner as before 


MONTEREY: 93940 
Predictor Coefficient Values 
(Intercept) Ox 0: 20 ll 
MA.POP 030001: 5484 
un.rate Sg WAS 0.1044 
EXECMNGE =O. DOO 12280 
FAFOFISH =0'0 003 1 Oe 
ADMINSPT 0.0002 2698 
PROFSNL Oi GOOr: 8756 
TECHSPT 0.0004 LSZ6 
SVCOTHR 0400.00 5016 
SVCPROT =0),.00 031 396 
SALES OOOO) 500 
CRE TSMAN =0,.0:0:0 1. 2806 
LABORERS =0),0009 145 
TRANSPO 0.0000 2104 
MV50GPO1 0.0000 3269 
MV50GP02 O00 07. 1584 
MV50GP03 0.0007 154 
MV50GP04 a OOM ORO ME 6347 
MV50GP05 -0.0009 pes, 
MV50GP06 = 03.00.03), 446 
MV50GP07 0420:0.031: 0 
MV50GP08 =O). 0:00: 6 
MV50GP09 =0.20:00 71: 128 
MV50GP10 axa OO ails: 5 
MV50GP11 0:00:00 0 
PREDICTED: 1.13 
ACTUAL: 0.39 


Table 4.9: Average Annual USAR Contracts Qualified for MOS 52D Prediction 
Results for Monterey, CA 93940 


such that we can achieve the dot product of the values for and coefficients of the 
regression equation for the annual average number of USAR contracts qualifying for 


MOS 52D in the ZIP Code. 


Looking at the information of ZIP Code 93940 for Monterey we find the actual 


average number of contracts qualifying for MOS 52D is 0.39 contracts. According to our 


aH 


model formulation, this value is not a rate, but rather the maximum number of recruits 


qualifying in ZIP Code 93940. The annual predicted number of contracts is 1.13 and the 


95% confidence level interval is (1.09, 1.17) with a standard error of 0.02. 


difference is 0.72 contracts. 


The 


Similarly Tables 4.10 and 4.11 demonstrate the same information for one Salinas, 


SALINAS: 93901 
Predictor Coefficient Values 
(Intercept) QO.1070 al! 
MA.POP Ov COOL. 4374 
un.rate So ay ye 1044 
EXECMNGE =0), 0007 6702 
FAFOFISH =0240:010.3 1434 
ADMINSPT OOOO Z 2405 
PROF ONL O20 001 4252 
EFRCHSPET 0.0004 1074 
SVCOTHR 0.0000 3610 
SVCPROT =U. 00 OL LAZO 
SALES O.0004 4879 
CREF TSMAN Oi. D001. 3067 
LABORERS -0.0009 70 
TRANSPO 0.0000 3994 
MV50GPO1 0.0000 eal 
MV50GP02 O00 Oa 4159 
MV50GP03 OO 0.077% 269 
MV50GP04 =O) 0.004 31.52 
MV50GP05 as ORC AORG LS) 39 
MV50GP06 =0' 00:04 667 
MV50GP07 OG007) 6 
MV50GP08 =0.5'0:0: 02. 476 
MV50GP09 =) 0.01 263 
MV50GP10 060.013 id. 
MV50GP11 0.0000 0 
PREDICTED: 0.79 
ACTUAL: 0.63 


Table 4.10: Average Annual USAR Contracts Qualified for MOS 52D Prediction 
Results for Salinas, CA 93901 


CA Zip Code, 93901 and one Seaside, CA Zip Code, 93955. The differences are 0.16 
and 0.13, respectively. The annual predicted number of contracts for Zip Code 93901 1s 
0.79 and the 95% confidence level interval is (0.71, 0.88) with a standard error of 0.04. 
The annual predicted number of contracts for Zip Code 93955 is 1.56 and the 95% 


confidence level interval is (1.47, 1.64) with a standard error of 0.04. 
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Vocational and demographic composition have a considerable effect on the 
outcome of the regression. Recall our regression equation for each MOS uses the full 


model. Any large increase or decrease in demographic composition will have an effect 


SEASIDE: 93955 
Predictor Coefficient Values 
(Intercept) Cree eer il 
MA.POP 00-01 6528 
un.rate Si AS 1044 
BE XECMNGE S00 O01 52 1a 
FAFOFISH =0' 00:03 478 
ADMINSPT O00 2 2280 
PROFSNL 0.0001 3455 
THCHSE T 0.0004 705 
SVCOTHR Oi OOO 8820 
SVCPROT =0 0:00 520 
SALES O00.0 1 4464 
CRE TSMAN =0,0001 GiOhe yy, 
LABORERS =0'.0009 ere. 
TRANSPO OL 90:0:0 3440 
MV50GPO0O1 0.0000 O96 
MV50GP02 0:00.01 A353 
MV50GP03 O0-0'017 454 
MV50GP04 =0 00 01 LAS 
MV50GP05 =0).:0009 58 
MV50GP06 =O) 0007. 36 
MV50GP07 0.0001 46 
MV50GP08 01300 071 954 
MV50GP09 =O. O21 267 
MV50GP10 =0).0 OaS 5 
MV50GP11 0.0000 0 
PREDICTED: 1.56 
ACTUAL: 1.43 


Table 4.11: Average Annual USAR Contracts Qualified for MOS 52D Prediction 
Results for Seaside, CA 93955 


on the prediction. As we review Appendix G and peruse the outcome of the vocations, 
segments, MA population, and unemployment rate coefficients, we note that MOSs have 
different coefficients indicating larger or smaller influences of these factors in the ZIP 


Code. 


For example, if we compare MOS 52D with MOS 95B we notice MV50 Segment 
Groups 1, 7, and 11 appear to be statistically insignificant for MOS 52D. By contrast, 
notice MV50 Segment Groups 1, 8, and 11 appear to be statistically insignificant for 
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MOS 95B. This also occurs with the vocations. The LM for MOS 52D does not appear 
to contain the SVCOTHR vocation while the LM for MOS 95B does not appear to 
contain the TRANSPO vocation. 

Table 4.12 has the max_recruits_zip_mos;; =max(0, min( MOS... AR Ave) for 


each of the MOSs for the three example ZIP Codes. 


ZIP Code 2D 4D TF 88M 25B MAX 
93901 0.790 0.940 1.220 1.250 1.150 1.250 
93940 1.130 1.260 1460 1470 1.420 1.470 
93955 1.560 1.840 2.150 2.150 2.150 2.150 


Table 4.12: Maximum Number of Recruits Qualifying for the USAR Top Five 
MOSs for ZIP Codes 93901, 93940, and 93955 


Note that the number qualifying varies by ZIP Code. The point of the analysis 1s 
that not all MOSs are equally supportable. We must consider ZIP Code supportability by 
MOS to obtain correct unit positioning. This variation is the reason for USAR unit force 


structure optimal stationing LM that we are developing. 


There are similarities in the LM development for the MOS 7 in ZIP Code i, but 
Phase II analysis must develop a model for each MOS. The basis for the LM formulation 
in Phase II can be the full model developed herein for the top five MOSs. 


We now have the two inputs for the Phase III model. Phase II of this analysis will 


develop the regression equations for the remaining 259 MOSs. 


F. MODELING OUTCOME 


Data analysis can only reveal some of the predictive tendencies of a modeled 
environment. Some analyses may not be able to show peculiarities in the data. We 
thought and supposed that segmentation, vocational, and unemployment information of 
markets at ZIP Code detail would have predictive capability on the number of contracts a 


ZIP Code can produce. 
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Our developed LM explains about 70% of the variation in the data. The 
remaining variation, with respect to our variables in the data, is assumed random. 
However, it appears as though there is some other phenomenon that would explain the 
remaining data variation. As previously stated, the RSCs differ in the segment and 
vocational composition. This information suggests that there are remaining non- 
demographic factors influencing the number of contracts produced by ZIP Code. 
Regionalization may have a discernible affect on the data. FIP, MSA, State, ASG, RSC, 


etc. may be a way to gain more predictive power with model. 


The focus of this analysis was to be able to predict the number of contracts a ZIP 
Code could produce based on market segments, vocational information, and 
unemployment rates. We developed a useful model that has 70% predictive power; that 


is, we were able to explain about 70% of the variation of the data. 


The number of NPS contracts a ZIP Code can produce may depend on additional 
aspects of the recruiting process not considered. Production may rely on mission quotas, 
mission levels, policy, etc. Another model to produce NPS contracts may be found in the 
historical contract data. One may be able to examine the historical data with respect to 
each market, provided structure has not changed, and develop a predictive model based 


on some kind of mean or moving average to smooth the data. 


One of the data exploratory methods not considered is this analysis is time series. 
Time series requires the collection of data elements by time interval. We could have 
arranged the contract data by month. To accomplish this, we could have included the 
actual contract date of the accession. The vocational information should not change 
much over time. Likewise, the segmentation would not change much over time. We 


could collect the unemployment rates by month for the same time period. 


Constructing the collection and subsequent analysis in this manner may lead to a 
better predictive model. This data may have seasonality associated with it. When we 
manipulated and assembled the current data, we used a single data point to summarize s1x 
years of data for each ZIP Code. This information may be bound by the construct of one 
data point to represent the entire 6-year period (FY98-FY03). What may be more 


appropriate is to obtain the monthly data for the ZIP Code and investigate time series 
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performance of the data. This approach may lead to a more lucrative predictor and a 


better understanding of the data peculiarities. 


Regionalization may have a discernible affect on the data. State, RSC, MSA, etc. 
may be a way to gain more predictive power with modeling. We see that the vocations 
and lifestyle segment for each RSC are statistically different in composition. The market 
composition has a great deal to do with the number of contracts USAREC obtains from 
the markets. We see that our modeling efforts has predictive power, gives explanation, 
and understanding as to what variables yield inclusion into modeling the annual number 


of contracts. 


We may be able to use an indicator variable for each of the regions (1,2,..., 10) to 
capture regional effects. This regional effect would then be translated into the intercept 


of our regression. 


The development of the MOS data also has predictive capability. We are able to 
explain about 65% of the variation in the data by the top five MOSs. This suggests that 
our model is useful in explaining the variation of the data by MOS and ZIP Code. We 
note that the prediction of the number of contracts a ZIP Code yields may also vary more 


because of the amount of effort a recruiter places on achieving his mission. 


It appears as though our developed model is plausible and generates new 


conclusions about the data. The next section addresses further considerations. 


G. POSITIONING UNITS TO OPTIMIZE OTHER METRICS, GIVEN 95% 
(OR OTHER LEVEL) OPTIMAL FILL. 


1. Cost (Incentives, reorganization, transportation, etc.). 

The cost implications for relocating structure are a function of whether there is an 
RC in the new determined location. If a RC exists at a location, the cost would be the 
cost of relocating structure to the new area plus the cost of relocating current structure, if 


applicable, to another area. 
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Costing of this information can be ascertained in the overall model developed in 
Phase III of this project. These metrics can be included in the LP. Once obtained, they 


can be optimized in the same manner. 


Di Geographical balance (the HLS connection). 

After consideration of moving and repositioning unit structure, we need to redress 
the geographical balance of our force structure distribution. As previously stated, the 
USAR constructs its RSC around the FEMAs. If the new structure has the desired 
vocations necessary to complete its FEMA missions, then we do not have an impact. We 
maintain the geographical balance. However, if the new structure does not have the 
desired vocations, then we may consider moving the needed structure to support the 
FEMAs, change some of the FEMA missions to accommodate the new structure, or do a 


combination of both. 


H. POSITIONING UNITS TO OPTIMIZE FILL RATE, GIVEN OTHER 
METRICS AS CONSTRAINTS 


The question remains, how to position the structure with respect to the market? 
Since we obtained the regression equations for the top five MOSs in the inventory, we 
can begin to position the force structure within the markets by using the equations and the 
follow-on Phase II equations as predictors. The factors of volume, quality, 
unemployment rate, vocations, etc. determine the supportable force structure 
composition. The ability of the market to support TPU structure at its current location 1s 


key to successfully determining the structure location. 


The regression equations for each MOS forecast the support of the MOS in the 
market. We augment those markets not obtaining appropriate level of MOSs with 
advertising or regionally based incentives. Offering educational, MOS bonus, or some 
other enticement may cause sufficient quantities of qualified MA to join those units. We 
are not too far off desired unit fill rates, in most cases. We may increase our fill rates by 


offering these inducements. 
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There may several reasons why some areas are successful and other areas are not. 
However, this study demonstrates the quality assessment information. Along with 
historical quality information, the vocational information of the area, and production 
information will tell what type structure is most successful by RC. It also demonstrates 
that the most prominent vocations of the area can be related to the type structure placed 


by the USAR to be supported in the area. 


If other Sister Services are willing to give up their production data, we could 
determine the “overall” affect/effect of the study on Department of Defense recruiting, 


retention, and structure placement efforts. 


I. CRITICAL ASSUMPTIONS 


The first critical assumption is human factors do not influence the outcome of the 
analysis (i.e. “All recruiters and commanders are created equal’). The second is that the 
“best” distribution methodology for force structure is independent of the requirements on 
recruiting and the needs of the force structure composition (i.e. recruiting effort and 


force structure requirements are independent). 


Thirdly, we assumed that there is no bias in the structure lay-down, quality 
assessment, positioning of recruiting assets, individual efforts of each recruiter, and 


production historical information. 


Lastly, it is reasonable to assume that vocations, lifestyle segmentation, LSCATs, 
etc. are market influencers. Without the knowledge of these items, we could not obtain 


necessary information about our population. 


J. SUMMARY 


The analysis demonstrates the assessment of the unit positioning and market 
quality has pay-offs. The results of this analysis need to be further studied and included 
as part of the constraint set in an optimizing distribution model. This provides the basis 


for the improvement of stationing and recruiting for America’s Army. 
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Vv. SUMMARY, CONCLUSIONS, AND RECOMMENDATIONS 


A. SUMMARY 


As with all analyses, we began this analysis with a problem. The problem was the 
unit fill rate environment of the USAR. Procedurally we processed our analysis by 
defining a structure to assist in the process. We identified the problem, identified factors 
or components, developed a model, collected the data, and determined the model’s 


validity. 
This thesis is Phase I. Recall these three phases are: 
Phase I: Process Definition, Data Collection, and Data Scrubbing. 
Phase II: MOS Build — Populate Data Fields for the Optimization Model. 
Phase III: Construct and Complete the Optimization Model. 


We assembled the data on over 30,000 ZIP Codes, over 800 RCs, and over 260 
Military Occupational Specialties (MOSs), drawing on and integrating over a dozen 
disparate data bases. This effort produced a single table with demographic, vocational, 
and economic data on every ZIP Code in America, along with the six-year results of RA, 
USAR, and Sister Service recruit production. Data was also obtained on the quality of 


each recruit and his suitability for each of the 264 Army MOSs. 


We see regression, with the considered variables, yields a predictive model to 
forecast numbers of contracts with suitable qualifications for each MOS. Preliminary 
modeling developed a model that accounts for about 70% of the variation in recruit 
production by ZIP Code. We also obtain the demographic and vocational composition of 


the ZIP Codes. 


Models for the top five USAR MOSs, contained in Appendix G, were also 
developed to predict the maximum number of recruits obtained from a ZIP Code for that 
MOS. ZIP Codes vary in their ability to provide recruits with sufficient aptitude for 


technical fields, and this is illustrated in this thesis with examples. 


67 


This modeling gives new explanatory and predictive capability. We had 
presumed that the unemployment rate of the ZIP Code would add explanation to the 
regression. In each of the models, the unemployment rates were statistically significant. 
However, it does not appear as though they are practically significant. In each case, we 
see a negative coefficient in the model. This is likely due to confounding effects among 


the predictors. 


Remember, Phase I built only the top five MOSs. The Phase I proof of principle, 
for the eventual optimization distribution model, is the development of the expected 
number of contracts a ZIP produces and the models available for the top five MOSs 1n the 
USAR inventory. The derivation of the MOS equations explains approximately 65% of 
the variation of the data. This is not a perfect model (of course no model is), but 1t does 
give explanatory and predictive capability not had previously. Phase I concludes with the 
determination of the regression equation for the number of contracts a ZIP Code can 


produce and the top five MOSs in the USAR. 


The second thesis, Phase II, in the series will develop models for all 264 MOSs 
and analyze them for commonalities and differences that reveal insights about recruit 
production for the USAR. Once we accomplish this for the MOS inventory, we can 
apply this to the constraint set in Phase III. This will also identify the regional 
propensity, by using an indicator variable in our regression model, of the market to join 
the USAR. The third thesis will use those models as constraints in a mixed integer linear 
program that positions the RCs to maximize their ability to man their units. The 
assignment of RC market ZIP Codes to maximize unit fill rates leads to increased unit 


readiness. This thesis creates an initial version of this program. 


This thesis automates the process of assembling and reconciling key data files 
using a commercial data-mining package called Clementine. That process 1s documented 
so that future analysts can avoid the nearly three man-months of work it took to create the 


master data file with its over 30,000 by 430 cells. This is a major contribution. 


These results support the solution of the unit fill rate problem and address many 
of the issues associated with determining the appropriate demographic, economic, and 


vocational factors of RC markets. Together these three theses will provide a powerful 
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tool for analysis of optimal reserve force stationing. This will greatly improve the 
readiness of the Reserve Components, unit deployment schedules, and Homeland 


Security. 


B. CONCLUSIONS 


This thesis assembled a database of recruiting, demographic, and economic data 
by ZIP Code. This database enables the modeling of potential recruit production by ZIP 
Code for the USAR. Since members of the USAR must in general live within 75 miles 
or 90 minutes of their RC, ZIP Code level detail 1s important for understanding the 


capability of a region to support its reserve units. 


The assembly of this data set was a difficult task. The thesis outlines the 
challenges, and more importantly, preserves the data mining algorithms developed in 


Clementine so that the next analyst’s work can be greatly reduced. 


The thesis developed regression models to predict the expected number of 
contracts that a ZIP Code could produce, and upper bounds for the number of those 
contracts that could be assigned to five representative MOSs. These expected values and 
bounds by ZIP Code can be developed for all 264 MOSs and 30,000 ZIP Codes in the 
United States, and that is proposed for a subsequent thesis. In turn, those values become 
constraints for the positioning of reserve units. We develop an LP to address that 


problem, and it is proposed as a third thesis. 


The regression models explain about two-thirds of the variation in recruit 
production and MOS potential. Remaining variation in recruit production is likely 
affected by policy variables (such as incentives) not captured in the database. Remaining 
variation in MOS potential likely reflects the underlying variability of educational 


attainment in the population. 


Some of the lessons learned in the Phase I process are variability of the ZIP Codes 
in demographic composition, among regions of the country, and across vocational 
information as well. We set out to find an explanation of the relationship of our data 


elements. We assumed that vocational, market lifestyle segmentation, unemployment 
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rates, etc. would be the explanatory variables between recruiting and unit fill. The 
amount of explained variation in the data is about 70% for contracts and about 65% for 


the five MOSs constructed. 


We demonstrated that ZIP Codes have different quality composition even when 
the numbers of recruits are similar. This quality aspect of the ZIP Code and subsequent 
market 1s the key to getting the right type of unit in the right location. This supportability 
is paramount to unit fill rates. The outcome of this analysis highlights the importance of 


considering quality in stationing decisions. 


As previously stated, there may be something not captured in the data. This may 
be the periodicity of the data. This phenomenon could be explored to ascertain whether 
times series is an appropriate model of consideration. There may be seasonality, trend, 


and other information that was not captured in our developed model. 


Subsequent analysis may be able to capture additional information 1n a time series 
and subsequently use these forecasts to incorporate them into a better predictive model. 
The time series alternative should be explored to ascertain whether it might prove to be 
more beneficial. Now that the data streams are complete, the analytical data runs are an 
automated process making it easier to update the data. All it takes now is time to 
complete the stream runs in Clementine. The effort for Phase II can be concentrated on 


the model for each of the MOSs. 


These results support the unit fill rate problem and address many of the issues 
associated with determining the appropriate demographic, economic, and vocational 
factors of RC markets. When combined with Phase II and Phase HI the model 1n its 
entirety will greatly contribute to unit personnel and training readiness. This will greatly 
aid in the reliance on the Reserve Components, unit deployment schedules, and 


Homeland Security. 


We can and will provide the strength, fill the ranks, train and lead our units to be 


the best combat multiplier in the world, today, tomorrow, and in the future. 
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C. RECOMMENDATIONS 


I principally recommend that OCAR pursue the completion of the two successor 
theses outlined in this thesis, so that unit fill potential is included in the discussion of 
positioning of reserve units. This is particularly timely as the nation prepares for another 


round of BRACs in 2005. 


I also recommend that OCAR construct a data-warehouse that automates the 
collection of the data using the methods 1n this thesis, and that automatically reconciles 
the discrepancies discovered in this thesis. It would be an easy task to assign to a 
contractor, and would greatly improve the ability of the entire USAREC analyst 


community to model local effects on recruit production. 


These models explain about 70% of the variation in recruit production. This 
demonstrates the effectiveness of regression and its predictive nature. Phase II needs to 
continue to pursue the number of contracts and the MOS build. I recommend exploration 
of the use of times series to explore the MOS models. Phase I results have predictive 


power, but there may be other factors that will explain additional variation in the data. 


Currently each RC has associated market ZIP Codes. I recommend this process 
to determine those ZIP Codes more appropriate for the current force structure or give 
insight as to the type of force structure best supported by the market. In each case, we 
can derive through the analysis the appropriate MOS, vocational, or lifestyle 


segmentation aspects for each RC. 


It would also be advisable to ensure that future studies, structure placement 
initiatives, recruiter placement initiatives, and any other initiatives be succinctly 
coordinated between the USARC, USAREC, and a Joint Partnership Evaluation Team 
responsible for ensuring transitional initiatives are planned, coordinated, and executed in 


unison for America’s Army. 
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APPENDIX A: 


TABLE DEFINITIONS DICTIONARY 


This is the data definition dictionary for the tables used in the analysis of USAR 


unit fill. 


This thesis used the following data and tables to determine a correlation 


between elements and use it to predict the outcome of stationing actions: 


TABLE/SOURCE DEFINITION 


FRC_FILE.DBF 
(74,176 Records) / 
OCAR 
PM03.DBF 


(4,778,080 Records) / 


USAREC 
MV50.DBF 
(43,362 Records) / 
USAREC 
ALLARMY.DBF 
(459,761 Records) / 
USAREC 
SISSERV.DBF 
(646,816 Records) / 
USAREC 
QUALS.DBF 

(458 Records) / 
USAREC 
P050.DBF 

(33,178 Records) / 
USBC 


RCMKT75.DBF 
(387,872 Records) / 
USAREC 
LAUCNTY.DBF 


(3,218 Records) / BLS 


ep.data.1.AllData.DBF 


(130,904 Records) / 
BLS 


The table has the structure of every unit in the USAR and its authorized, 
required, assigned strength totals. It will be used to determine the units needing 
or having fill problems. The UNIT FILL RATE = ASSIGNED STRENGTH / 
AUTHORIZED (by MOS) for each unit in the USAR. 


The table has the Military Available population by race, ethnicity, gender, and 
ZIP code level of detail. The data is for FY 2000-2020 projected with the 
anticipated growth rates of the population due to trend analysis. Data is current 
as of FY2003. 


The table has the Microvision Lifestyle Segmentation for each ZIP Code. It will 
determine the most prominent segments in the ZIP code and used to determine 
correlations among enlistments and MOS Skill sets at the ZIP Code Level, FIP 
Code, or Reserve Center Level. (See Appendix D) 


The table has the quality of enlistments for the Army for FY1999-FY2003 by 
ZIP Code for each applicant who made entry into the USAR. It contains 
contract data for all components of the Army. 


The table has Sister Service contract data for FY 1999-2003. It will be used to 
determine Sister Service competition on a market. 


The table has the required ASVAB Test Scores, by category, for each MOS in 
the inventory. Its use will be to determine the minimum required test score for 
each applicant to obtain an MOS. If the market cannot test sufficiently high 
enough to obtain an MOS, we conclude the RC may not support the MOS. 


The table has the Bureau of Labor and Statistics Vocational data for each ZIP 
Code. It contains information by vocations of the working population aged 16- 
69 in each ZIP Code. This information will be used to determine the most 
prominent vocation of each ZIP Code to determine a correlation of MOS Skills 
with the market/ZIP Code. 


The table has the market ZIP Codes for each RC. Each market ZIP Code is not 
unique; it may be a market ZIP for multiple RCs. The market ZIPs are those 
within 75 miles of each RC. 


The table has the Employment and Unemployment Data for each County in the 
US verified for 2003. This table has the Labor Force, Employed Labor Forced, 
Unemployed Labor Force, and the Unemployment Rate for each US County. 
Unemployment Rate = Unemployed / Labor Force 


The table has the Employment Data for each State from 1981— 1999. The table 
has both seasonal and unseasonal data. 





gp.state. DBF The table has the numerical State codes for each of the fifty states plus those for 


DC and Puerto Rico. 





(52 Records) / BLS 


74 


APPENDIX B: TABLE DATA FIELDS AND DESCRIPTIONS 


This is the data field and field descriptions for each table used in the analysis of 
USAR unit fill: 


TABLE FIELD NAMES FIELD DESCRIPTION 


ZIP = ZIP Code for the Data Elements 
ae ee P01 TOT > Total Working Population in ZIP Code 


(Derived Table) TOT_yyyyyy = Total Categorical Working Population in ZIP Code 
(Same as MALE + FEMALE for Category) 
[yyyyyy| » MGTPRO, BUSFIN, MGTOTH, 
FRMMGR, BUSFI2, BUSOPS, FINSPC, 
PRFSNL, CMPMTH, ARCENG. ARCSUR, 
DRENMA, LPSSCI, CMSOSV, LGLOCC, 


EDTRLI, ARETSP, HLTPRA, HDITRT, 
HLTTCH, SVCOCC, HLTSPT, PRTSVC, 
FFPRLW, PRTOTH, FDPRSV, BLGRCL, 
PSLSVC, SALOFF, SALOCC, ADMSPT, 
FMFIFO, CNEXMT, CONEXT, SUPCON, 
CONTRD, EXTRTN, INMTRP, PRTRMA, 
PRDOCC, TRMAMV, SUPTRA, ACRATC, 
VEHOPR, RLWTOT, MTLMOV 

Total Male Working Population in ZIP Code 

Total Categorical Male Working Population in ZIP Code 

[xx] » 03-48  [yyyyyy] = Same as previous 
Total Female Working Population in ZIP Code 
Total Categorical Female Working Population in ZIP Code 
[xx] =» 49-95 [yyyyyy] = Same as previous 

Total Count of MV Segments in the ZIP Code 

Percentage of MVxx Segment in the ZIP Code 

The Unit Identification Code 

The Activation Code of the pending action 

The Effective Date of the pending action 

The Unit Number 

The Station Number or ZIP Code 

The Location Code or State of the unit 

The Stationed count of Officers in the unit 

The Stationed count of Warrant Officers in the unit 

The Stationed count of Enlisted in the unit 

The Authorized count of Officers in the unit 

The Authorized count of Warrant Officers in the unit 

The Authorized count of Enlisted in the unit 

The Tier level of the unit 

The Station Code of the unit 

The Last Date of the entry of information for the unit 

The Fiscal Year of the pending action 

The ZIP Code of the population information 

The Race of the population 

The Sex of the population 

The Year of the population information (Year range [2000, 

2020]) 





M02 MALE 
Mxx_yyyyyy 


F49 FEMALE 
Fxx_yyyyyy 


TTL MV50 
PCT _Mvxx 
UIC 
ACTCO 
EDATE 
UNMBR 
STNNMR 
LOCCO 
STOFF 
STWOF 
STENL 
AUOFF 
AUWOF 
AUENL 
TIER 
STACO 
LASTUPDT 
FY 
ZIPCODE 
RACE 

SEX 
Y2000 





FRC_FILE.DBF 


PM03.DBF 


VVUVYVVYVVVVVVV VV VV OY 


= The Age of the population 


TABLE FIELD NAMES FIELD DESCRIPTION 


ZIP The ZIP Code for the Data Elements 
anor ota Mvxx Count of MV Segments in the ZIP Code [xx] = 01-50 
ppens 2) TTL MV50 Total Count of MV Segments in the ZIP Code 
PCT MVxx Percentage of MVxx Segment in the ZIP Code 
FY The Fiscal Year of the accession action 
SSN Individual’s Social Security Number 
AFQT The Armed Forces Qualification Test Score (0-99) 
GT General Technical Categorical ASVAB Line Score 
GM General Mechanical Categorical ASVAB Line Score 
EL Electrical Categorical ASVAB Line Score 
CL Clerical Aptitude Categorical ASVAB Line Score 
MM Mechanical Maintenance Categorical ASVAB Line Score 
SC Signal & Communications Categorical ASVAB Line Score 
CO Combat Operations Categorical ASVAB Line Score 
FA Field Artillery Categorical ASVAB Line Score 
OF Operations & Food Service Categorical ASVAB Line Score 
ST Science & Technology Categorical ASVAB Line Score 
RCZIP The Reserve Center ZIP Code, if any, of the accession action 
COMP CD Component Code (G-Guard, V-Reserve, R-Regular Army) 
ZIP ZIP Code for Data Element 
SEGMENT Microvision Lifestyle Segment (1-50) 
UIC Unit Identification Code of the Unit the Individual joined 
MOS Military Occupational Specialty Code 
RDOE The Reserve Date of Enlistment of the accession action 
SKILL LEVEL Skill Level of the MOS (1-5) 
M_ ZIP ZIP Code for the Data Elements 
SERV COMP The Service Component for the accession 
SEX The Sex of the accession 
MEP RACE The Race of the accession 
MEP ETHIN The Ethnic Code of the accession 
DOB The Date of Birth of the accession 
EDYRS The Years of Education of the accession 
EDLEVEL The Education Level of the accession [0-24] 
HEIGHT The Height of the accession 
WEIGHT The Weight of the accession 
PUHLES The PUHLES scores from accessioned physical 
AFQT The AFQT Score of the accession 
ZZ. SCORE The Categorical Raw ASVAB Scores for the accession 
[zz] =» GS, AR, WK, PC, NO, CS, AS, MK, MC, EI, and VE 
The Test Score Category of the accession 
The Fiscal Year of the accession 
The 4 Character Military Occupational Specialty (MOS) 
Career Management Field (CMF) of the MOS 
Description of the Numerical CMF 
VOCATN The BLS Vocation (13 Major Categories) 
AFQT The Armed Forces Qualification Test Score (0-99) 
GT = General Technical Categorical ASVAB Line Score 
GM = General Mechanical Categorical ASVAB Line Score 
EL =» Electrical Categorical ASVAB Line Score 
CL = Clerical Categorical ASVAB Line Score 





ALLARMY.DBF 


SISSERV.DBF 


TSC 

MS FY 
MOS4 

CMF 

CMF DESCR 





QUALS.DBF 


VYUVYVVV YY VV VV VV VV VV VV VV VV VV VV VV VV VV VV VV VV VV 


MM = Mechanical Maintenance Categorical ASVAB Line Score 
SC = Signal & Communications Categorical ASVAB Line Score 
CO = Combat Operations Categorical ASVAB Line Score 

FA = Field Artillery Categorical ASVAB Line Score 
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ee ae = Operations & Food Service Categorical ASVAB Line Score 
ST =» Science & Technology Categorical ASVAB Line Score 
= ZIP Code for the Data Elements 
Byun > Total Working Population in ZIP Code 
= Total Categorical Working Population in ZIP Code 
(Same as MALE + FEMALE for Category) 
[yyyyyy| » MGTPRO, BUSFIN, MGTOTH, 
FRMMGR, BUSFI2, BUSOPS, FINSPC, 
PRFSNL, CMPMTH, ARCENG. ARCSUR, 
DRENMA, LPSSCI, CMSOSV, LGLOCC, 
EDTRLI, ARETSP, HLTPRA, HDITRT, 
HLTTCH, SVCOCC, HLTSPT, PRTSVC, 
FFPRLW, PRTOTH, FDPRSV, BLGRCL, 
PSLSVC, SALOFF, SALOCC, ADMSPT, 
FMFIFO, CNEXMT, CONEXT, SUPCON, 
CONTRD, EXTRTN, INMTRP, PRTRMA, 
PRDOCC, TRMAMV, SUPTRA, ACRATC, 
VEHOPR, RLWTOT, MTLMOV 
Total Male Working Population in ZIP Code 
Total Categorical Male Working Population in ZIP Code 
[xx] =» 03-48  [yyyyyy] = Same as previous 
Total Female Working Population in ZIP Code 
Total Categorical Female Working Population in ZIP Code 
[xx] =» 49-95 [yyyyyy] = Same as previous 
The RC ZIP Code 
A Market ZIP Code of the RC. ZIP Codes are within a 75- 


RCZIP 
RCMKT75.DBF MKTZIP 
mile radius of the RC. 


LAUS CODE The Local Area Unemployment Code 
eee ST FIPS The State FIPS used by BLS and USBC. (Same as 
gp.state. DBF) 
01=Alabama, 02=Alaska, ..., 56=Wyoming 
CNTY NAME The County Name 
ST NAME State Name for each State used in the table. 
ST ABBR 2 Letter State Abbreviation as provided by BLS and USBC. 
YEAR The Year of the information. 
LBR_ FRC Labor Force Population in the County. 
EMPL Employed Labor Force in the County. 
UNEMPL Unemployed Labor Force in the County. 
UNEMPL RATE Unemployment Rate for the County. (UNEMPL/LBR_ FRC) 


a SERIES ID The Series Identification Number (GPU00100000E0000) 

Jee pee a YEAR The Year of the Data 

; PERIOD The Period of the Data 

The Value of the Data 

The Footnote Codes of the Data (Variable Information) 
The series_id (GPU00100000E0000) can be broken out into: 
survey abbreviation=GP, seasonal (code) =U, 
area type code =0,_ state code =01, area code=0000, 
labor force code=E, charact_code=0000 

STATE CODE = State Code used by BLS and USBC. 
01=Alabama, 02=Alaska, 04=Arizona, 05=Arkansas, 
06=California, O8=Colorado, 09=Connecticut, 10=Delaware, 
11=D.C., 12=Florida, 13=Georgia, 15=Hawai, 16=Idaho, 
17=Illinois, 18=Indiana, 19=lowa, 20=Kansas, 21=Kentucky, 
22=Louisiana, 23=Maine, 24=Maryland, 25=Massachusetts, 


P01 TOT 
TOT_yyyyyy 





M02 MALE 
Mxx_yyyyyy 


F49 FEMALE 
Fxx_yyyyyy 





VU VY VY VY 


VALUE 
FOOTNOTE 


VVUVYVYUV VV 


gp.state. DBF 





TABLE FIELD NAMES FIELD DESCRIPTION 


26=Michigan, 27=Minnesota, 28=Mississippi, 29=Missouri, 
30=Montana, 31=Nebraska, 32=Nevada, 33=New Hampshire, 
34=New Jersey, 35=New Mexico, 36=New York, 37=North 
Carolina, 38=North Dakota, 39=Ohio, 40=Oklahoma, 
41=Oregon, 42=Pennsylvania, 44=Rhode Island, 45=South 
Carolina, 46=South Dakota. 47=Tennessee, 48=Texas, 49=Utah, 
50=Vermont, 51=Virginia, 53=Washington, 54=West Virginia, 
55=Wisconsin, 56=Wyoming 
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APPENDIX C: OCCUPATIONS AND WORKING CLASS 
CATEGORIES 


White Collar Category Occupations 


Executive and Managerial: [EXECMNGE] 
Legislators 

Chief Executives and General Administrators, Public Administration 
Administrators and Officials, Public Administration 
Administrators, Protective Services 

Financial Managers 

Personnel and Labor Relations Managers 

Purchasing Managers 

Managers, Marketing, Advertising, and Public Relations 
Administrators, Education and Related Fields 
Managers, Medicine and Health 

Managers, Properties and Real Estate 

Postmasters and Mail Superintendents 

Funeral Directors 

Managers and Administrators 

Management Related Occupations 


Professional Specialty: [PROFSNL] 
Mathematical and Computer Scientists 

Natural Scientists 

Architecture and Engineering Occupations 
Architects, Surveyors, Cartographers, and Engineers 
Health Diagnosing Occupations 

Health Assessment & Treating Occupations 
Teachers, Post-secondary 

Teachers, except Post-secondary 

Counselors, Educational and Vocational Librarians, Archivists, and Curators 
Social Scientists and Urban Planners 

Social, Recreation, and Religious Workers 


Technical Support: [TECHSPT] 

Health Technologists and Technicians 

Technologists & Technicians, except Health 
Drafters, Engineering, and Mapping Technicians 
Science Technicians 

Technicians, except Health, Engineering, and Science 
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Sales Occupations: [SALES] 

Supervisors and Proprietors 

Sales Occupations 

Sales Representatives 

Commodities except Retail 

Sales Workers, Retail and Personal Services and Sales Related Occupations 


Administrative Support: [ADMINSPT] 
Supervisors 

Administrative Support Occupations 

Computer Equipment Operators 

Secretaries, Stenographers, and Typists 

Information Clerks 

Records Processing Occupations, except Financial 
Financial Records Processing Occupations 
Duplicating, Mail & Other Office Machine Operators 
Communications Equipment Operators 

Mail and Message Distributing Occupations 

Material Recording, Scheduling, and Distributing Clerks 
N.E.C. 

Adjusters and Investigators 

Miscellaneous Administrative Support Occupations 








Blue Collar Category Occupations 


Farm, Forestry & Fish: [FAFOFISH] 
Farm Operators and Managers 

Other Agricultural and Related Occupations 
Forestry and Logging Occupations 

Fishers, Hunters, and Trappers 


Laborers: [LABORERS] 

Supervisors, Handlers, Equipment Cleaners Helpers, Mechanics and Repairers 
Helpers, Construction and Extractive Occupations Construction Laborers 
Production Helpers 

Freight Stock and Materials Handlers 

Garage and Service Station, Related Occupations 

Vehicle Washers and Equipment Cleaners 

Hand Packers 
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Other Service (except Protective & Household): [SVCOTHR] 

Arts, Design, Entertainment, Sports, and Media Occupations 

Food Service Preparation and Service Occupations 

Health Service Occupations 

Cleaning and Building Service Occupations, except Household 

Personnel Service Occupation 

Launderers and Ironers 

Cooks, Private Household 

Housekeepers and Butlers 

Childcare Workers, Private Households Private Household Cleaners and Servants 


Precision Craftsmen: [CRFTSMAN] 

Mechanics and Repairers 

Construction Trades 

Construction Trades, except Supervisors 

Extractive Occupations 

Precision Production Occupation 

Precision Woodworking 

Precision Textile, Apparel, and Furnishings Machine Operators 

Precision Food Production 

Precision Inspectors, Testers, and Related Workers 

Plant and System Operators 

Metal Working and Plastic Working Machine Operators Fabricating Machine Operators 
Metal and Plastic Processing Machine Operators Woodworking Machine Operators 
Printing Machine Operators 

Textile, Apparel, and Furnishing Operators Machine Operators, Assorted Materials 


Protective Service: [SVCPROT] 
Supervisors, Protective Service Occupation 
Firefighting and Fire Prevention 

Police and Detectives 


Guards 


Transportation & Material Moving: [TRANSPO] 
Aircraft and Traffic Control Operators 

Motor Vehicle Operators 

Transportation Occupations, except Motor Vehicles 
Railroad Transportation 

Water Transportation 

Material Moving Equipment Operators 

Production, Transportation, and Material Moving Occupations 
Operating Engineers 

Long Shore 

Hoist & Winch Operators Crane & Tower Operators 
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P050 TABLE NUMBER & DESCRIPTION 


MALE | FEMALE DESCRIPTION CATEGORY 
P050002 | P050049 | Total in Population | 


occupations 
operations occupations 
farm managers 


occupations 


PO50010 PO50057 | Professional and related occupations PROFSNL 
PO50011 PO50058 | Computer and mathematical occupations PROFSNL 
PO050012 PO50059 | Architecture and engineering occupations PROFSNL 


P050013 PO50060_ ‘| Architects, surveyors, cartographers, and PROFSNL 


engineers 


PO0S50014 POSO0061 | Drafters, engineering, and mapping TECHSPT 
technicians 


PROFSNL 


Arts, design, entertainment, sports, and 
media occupations 
occupations 
and technical occupations 
TECHSPT 
VCOTHR 


P050024 P050071 | Healthcare support occupations TECHSPT 
P050025 PO050072 | Protective service occupations SVCPROT 


Fire fighting, prevention, and law SVCPROT 
enforcement workers, including supervisors 
Supervisors 


N 





P050028 PO50075__| Food preparation and serving related SVCOTHR 
occupations 


P050029 P050076 | Building and grounds cleaning and SVCOTHR 


maintenance occupations 


PO050030 P050077 | Personal care and service occupations SVCOTHR 
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MALE FEMALE DESCRIPTION CATEGORY 
P050031 P050078 | Sales and office occupations SALES 
P050032 PO050079 | Sales and related occupations SALES 


P050033 PO050080 ‘| Office and administrative support ADMINSPT 
occupations 


P050034 PO50081 | Farming, fishing, and forestry occupations FAFOFISH 


P050035 P050082 =| Construction, extraction, and maintenance CRFETSMAN 
occupations 


P050036 | P050083 CRFTSMAN 


P050037 P050084 | Supervisors, construction and extraction LABORERS 
workers 


POS0038 | P0S008S CRFTSMAN 
POS0039 | P0S0086 CRFTSMAN 


occupations 
moving occupations 


P0S0042 | P0S0089 TRANSPO 


occupations 
moving workers 


P050045 PO50092_ ‘| Aircraft and traffic control occupations TRANSPO 
P050046 P050093 =| Motor vehicle operators TRANSPO 


P050047 P050094_ ‘| Rail, water and other transportation TRANSPO 
occupations 


P050048 POS50095__—-| Material moving workers TRANSPO 


NOTE: Tables and Descriptions provided by the US Bureau of the Census 
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APPENDIX D: MICROVISION 50 LIFESTYLE SEGMENTS 


SEG | SEGMENT SEGMENT GRP GROUP 
# NAME DESCRIPTION # NAME 
Metropolitan couples and families, 


very high income and education, 
homeowners, very high property 


Upper Crust 
2 


Lap of Luxury 


Established 
Wealth 


4 Mid-Life 


Success 


5 Prosperous 
Metro Mix 


3 
Good Family 
Life 

7 


Comfortable 
Times 


Movers and 
Shakers 


Building a 
Home Life 


Home Sweet 


() 
Home 
] 


12 | A Good Step 
Forward 


ial Family Ties 


values, managers/ professionals 
Families, teens, very high income 
and education, homeowners, 
managers/ professionals, 2-worker 
families 

School-age families, high income. 
high education, homeowners, 
managers and professionals 
Families with high education, high 
income, managers/professionals, 
technical/sales 

Families with young children, high 
education, high income, 
managers/professionals, 
technical/sales 

Families, children age 5-17, very 
high education, high income, 
executives, managers/professionals, 
technical/sales, home owners 
Middle-aged heads of household, 
families, high income, medium-high 
education, technical/sales, 
managers/professionals 

Singles and couples, students and 
recent graduates, high education and 
income, managers/professionals, 
technical/sales 

School-age families, new housing, 
medium-high education, 
technical/sales, 
managers/professionals 

Married Couples, one or no children, 
some retirees, medium-high income 
and education, managers/ 
professionals, technical/sales 

Large families, medium education, 







medium-high income, technical/sales, 


Precision/crafts, two workers 
Mobile singles, high education, 
medium income, often renters, 
managers/professionals, 


technical/sales 


13. | Successful Urban areas, renters, young singles 
Singles and couples, older housing, ethnic 
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| 


Accumulated 
Wealth 


Accumulated 
Wealth 





Accumulated 
Wealth 
Accumulated 
Wealth 


Accumulated 
Wealth 


Accumulated 
Wealth 


Conservative 
Classics 


Mainstream 
Singles 


Young 
Accumulators 


Mainstream 
Families 





Mainstream 
Families 


Mainstream 
Singles 





Sustaining 
Singles 
















# NAME DESCRIPTION Ht NAME 
|i. mix, high education, medium income, it. 
managers/ professionals 
14 | Middle Years | Mid-life couples, families, medium- l Accumulated 
high education, mixed occupations. Wealth 
medium income 
15 | Great Young, singles and couples, medium- 4 Mainstream 
Beginnings high education, medium income, Singles 
some renters, managers/professionals, 
technical/sales 
16 | Country Home | Large families, rural areas, medium 2 Mainstream 
Families education, medium income, Families 
precision/crafts - trades 
17 | Stars and Young heads of household, large 2 Mainstream 
Stripes families with school-age children, Families 
medium income and education, some 
military, precision/craft 
18 | White Picket Young families, low to medium 2 Mainstream 
orecision/crafts, laborers 
19 | Young and Young, singles and couples, no kids, 3 Young 
Carefree medium income, medium-high Accumulators 
education technical/sales, managers/ 
professionals 
20 | Secure Adults | Mature/seniors, metro fringe areas, Conservative 
singles and couples, medium income, Classics 
medium education, mixed 
occupations and some retirees 
21 | American Seniors, singles and couples, no kids, Conservative 
Classics suburban areas, medium income, Classics 
medium education, mixed 
occupations and some retirees 
22 | Traditional Seniors, no kids, low education 2 Mainstream 
Times levels, medium income, laborers, Families 
precision/crafts workers, some 
retirees 
23 | Settled In Empty nesters, no kids, medium 2 Mainstream 
education and income, some retirees, Families 
technical/sales and service 
occupations 
24 | City Ties School-age families, urban areas, Sustaining 
African-American, average income, Families 
average education, service and 
laborer occupations 
25. | Bedrock School-age families, medium income, 3 Young 
America low-medium education, Accumulators 
precision/crafts, military, laborers 
26 | The Mature Couples and small families, medium 7 Cautious 
Years income, low-medium education, Couples 
orecision/crafts, laborers 


Middle of the | School-age families, medium income, 
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# NAME DESCRIPTION Ht NAME 
education levels, mixed occupations Families 
28 | Building a Families, school-age children, 3 Young 
Family medium income, medium-low Accumulators 
education, mixed occupations 
29 | Establishing Families with kids of all ages, 5 Asset- 
Roots medium income, low education. 
mixed occupations 


30 | Domestic Duos | Mature/seniors, singles and couples, 
no kids, medium-low income, mixed 
housing, medium education, 
technical/sales, 
managers/professionals, some retirees 

3] 


Country Middle-aged to mature heads of - Conservative 






Building 
Families 
Conservative 
Classics 


Classics household, seniors, medium-low Classics 


income, low education, some mobile 
4 Mainstream 
Singles 


homes, laborers 
7 Cautious 


Metro Singles | Singles, renters, urban areas, multi- 
unit housing, low education, medium- 
low income, technical/sales, laborers 
Couples 





a2 


oS) 


aN eS) eS) 


3 | Living Off the | Rural areas, school-age families, 
Land medium-low income, low education, 


farming/fishing, laborers 
Books and Young, high education, medium-low 4 Mainstream 
New Recruits | income, students, Singles 








managers/professionals, service 
occupations, some military, renters 





35. | Buy American | Families with school-age kids, 2 Mainstream 
medium-low income, low education, Families 
laborers 

6 | Metro Mix Young singles, no kids, ethnic mix, Sustaining 
medium-low income, mostly renters, Singles 
multi-unit housing, use public 
transportation 

37 | Urban Up and | Young, singles, ethnic mix, renters, Sustaining 

Comers multi-unit housing, high education, Singles 
medium-low income, 
managers/professionals 
38 | Rustic Rural areas, families, school-age 2 Mainstream 
Homesteaders_ | kids, low education, medium-low Families 
income, some mobile homes, 
farming/fishing, laborers 

39 | On Their Own | Mix of young and seniors, singles 4 Mainstream 
and couples, medium-low income, Singles 
medium-high education, 
managers/professionals, 
technical/sales, some renters 


housing, owners and renter, low 





Trying Metro | Mix of young and seniors, urban, 4 Mainstream 
Times ethnic mix, low income, older Singles 


education levels, varied occupations. 


t# NAME DESCRIPTION #t NAME 
Close Knit Primarily Hispanic, large families, Sustaining 
Families kids of all ages, low income and Families 


education, precision/craft occupations 
Sustaining 
Families 


and laborers 
Sustaining 


42 | Trying Rural Large families, ethnic mix, low 
Times income and education, some mobile 
homes, service occupations, laborers 
Families 


Manufacturing | Largely African American, singles 
and families, older housing, low 
income and education, service and 
laborer occupations 

44 | Hard Years Young adults and seniors, low Sustaining 
income and education, older multi- Families 
unit housing, renters service 

5 


occupations, laborers 


Sustaining 
Families 


Difficult Times | Primarily African-American, school- 
age families, urban areas, very low 
income, low education, laborers and 
service occupations 

University Students and singles, dorms and 

USA group quarters, very low income,- 
medium-high education, 
technical/sales 


48 | Urban Singles | Mix of young and seniors, singles, 
renters, old multi-unit housing, urban 
areas, very low income, mixed 
education levels, service occupations, 
technical/sales 


B No homogeneity 


Unclassified Post Office Boxes and unclassified Unclassified 
population 


Sustaining 
Singles 


Sustaining 
Singles 


4 Struggling Young, singles, urban, cultural mix, Sustaining 
Metro Mix renters, low income, mixed education Singles 
levels, older multi-unit housing 
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APPENDIX E: CLEMENTINE SCREEN SNAPSHOTS 
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Figure E.1: Clementine Screen Snapshot — QUAL Data Collection 


NOTE: All data streams created in Clementine have been saved to a file for future works 
(Phase IT and III). Copies were distributed to my Thesis Advisor: Dr David H. Olwell and Second 
Reader: Dr Samuel E. Buttrey. These files are also available by request from the author for follow-on 
analysis. 


89 


jobymy¥50_3 - Clementine 8.0 GF 





File Edit Insert View Tools SuperNode Window Help 


[Lae Tea] [2] |S.) of] (o/c [4 [> [| [| eA 


ooog 


ee eat Authrd 
®) (=) 

USARTOT.DBF a a Sort RCZIP AUTHORIZATIONS CHG "Auth; to "RCZip.. 
ENL Auth = Sa ee ONLY DEL Stations wiNo Zi. ENL Auth = 2 
y ENL Pctg ENL Pct_Auth =2 

Skill + 1 —" wiNo Zi. DEL Inactivations NO Inactivations/Mis.. 

— 

®) >) —> @ —> a 


Activations &Stations Tier 1 & 2 Units DELENLPetg==0.90 Chng"zipy to "rezip.. Prob Tier 1&2 RC Zip.. 
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Figure E.2: Clementine Screen Snapshot — USARTOT Data Collection 
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Figure E.3: Clementine Screen Snapshot - RCMKT75 Data Collection 
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Figure E.4: Clementine Screen Snapshot —-JOBMVS5S0 & MAPOPLAU Data 
Collection 
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Figure E.5: Clementine Screen Snapshot — SISSERV Data Collection 
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Figure E.6: Clementine Screen Snapshot — ALLARMY (Part 1 of 2) Data 
Collection 


94 





jobymy¥50_3 - Clementine 8.0 i 








File Edit Insert View Tools SuperNode Window Help 


[ale Teal [2] 3. @/a| hole [il [>] Gm x ALS 


>) > @) + @ + ® > 


Qualified by"MOS" Contracts by"MOS" b.. Type - Cnt ARMY byMOSbyZIP 








ALLARMY MOS Qualify AllArmy_MOS Qualify al > 


ARMY byMOSbyZIP Type ARMY byMOSbyZIP 


= — Modeling 
Evaluation 


Deployment 
QUAL3 CREATE qualmos 


@ Sources [| © RecordOps [| © FieldOps | A Graphs {| @ Modeling | ™ Output 


O® ® |@DOE A A A|@ © @ @ © 


Database Var. File Select Sample Aggregate Derive Type Filter Plot Distribution Histogram | NeuralNet Kohonen C5.0 C&R Tree K-Means Table FlatFile Database 


4 > 
0 Server: Local Server [| [74Mb / 232M 


PAstart | | 4] 2 © QEMAaRE & | (A) Thesis DataHD | ff | ThesisStuff - Microsoft W... || © jobvmvs0_3 - Clemen... €82@48  s:usem 











Figure E.7: Clementine Screen Snapshot —- ALLARMY (Part 2 of 2) Data 
Collection 
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Clementine Screen Snapshot —- ALLDATAbyZIP Data Collection 


APPENDIX F: DATA TABLE DERIVATION 


Derived tabular information produced by Clementine streams. Appendix E 
(Clementine Screen Snapshots) contains the graphical representation of the information. 
Tables derived from collected data contain the following information: 


TABLE 
RC VCTNS&QUAL RQD 





RCZip TOT ALLOCATION 


JOBMVPOP 


SISERVAFQT 


ARMYbyZIP 


ARMYbyMOSbyZIP 


ALLDATAbyZIP 


DERIVATION 
Produced by merging the USARTOT structure by MOS 
information and the MOS QUAL table 
Produced by merging the USARTOT structure by MOS 
information and the RCMKT75 table 
Produced by merging the JOBMV5Onew table and the 
MAPOLAU table. The MAPOLAU table has the BLS 
Vocational, MA _ population, and the Local Area 
Unemployment statistics. 
Produced by building the Sister Service component 
AFQT information 
Produced by building the Army component AFQT 
information, LSCAT information, MV50 Segmentation 
information, and MOS Qualification by ZIP Code 
information. Subsequently merging the three separate 
pieces of information. 
Produced by conducting a quality check of each MOS 
with contract LSCAT data. Each MOS by ZIP Code was 
compared to the LSCAT of the contract. If the contract 
LSCAT = MOS needed LSCAT then the contract 
qualified for the MOS, otherwise it did not. 
Produced by merging the JOBMVPOP, SISERVAFQT, 
ARMYbyZIP, and ARMYbyMOSbyZIP information. 
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APPENDIX G: TOP FIVE MOS REGRESSION EQUATIONS 


LINEAR REGRESSION MODEL FORMULATION 


52D | FULL MODEL: 
q.o2D.Avg.AnnlL “~ Unwradbe- + ‘MA,POP + -EXECMNGE: = FAFOPISH 
ADMINGET + _PROPSGNL, 4° EEBCHSPT << SVCOTHR’ +. -SVCPROT: «bh SALES 
CRFTSMAN + LABORERS + TRANSPO + MV50GP01 + MV50GP02 + MV50GP03 
MV50GP04 + MV50GP05 + MV50GP06 + MV50GP07 + MV50GP08 + MV50GP09 
MV50GP10 + MVS50GP11 


+++ +4 


Residuals: Min 10 Median 30 Max 
a eo. SO wy lA) 0 20S L0 3 0.1 US) OG 


Coefficients Value Std.Error t-value Pr(>|/t 


(Intercept) Oe O70 0.0093 11.4496 0.0000 
un.rate -1.2145 0.1402 —-8.6628 0.0000 
MA.POP DOO. 0.0000 2a AZo 0.0000 
EXECMNGE -0O.0001 0.0000 =16%.5890 0.0000 
FAFOFISH -0.0003 020000 Sil oe Oo 0:30:00: 
ADMINSPT O20 002 Oe O00 TO 2591.6 0.0000 
PROFSNL 0.0001 0.0000 12.0986 0.0000 
TECHS? T 0.0004 0.0000 17.8766 O00 OO 
SVCOTER 0.0000 0.0000 2.2840 0.0224 
SVCPROT -0.0001 0.0000 =3'.3660 0.0008 
SALES 0:00: Os 00:00 SELL 00000 
CRFTSMAN -0O.O0001 0.0000 Sala aA es 0.0000 
LABORERS -0.0009 0002 Soin, Jou 0.0000 
TRANSPO 0.0000 0 BO00:0 ee ENB) 0.0061 
MV50GPO1 0:. 0000 0... 00:00 = 3-44 6.5 1 0.0016 
MV50GP02 0.0001 0.0000 1360-0 1 0.0000 
MV50GP03 0.0007 020000 3 3340 orld. 0.0000 
MV50GP04 0.0000 0.0000 Se ee 0.0000 
MV50GP05 -0.0009 OOO =P 0296 0.0000 
MV50GP06 -0O.0001 Oa ONONOKE =i 6 FO LS O.0:00:0 
MV50GP07 0.0000 Os OOO2 02259 1. 0.7956 
MVS50GP0S. —Q:0.0:01. 0.0000 St) 0.0000 
MV50GP09 -0O.0001 Oe 0C:00 =16.49 19 0.0000 
MV50GP10 -0.0013 0003 =4°,.04 72 0.0001 
MV50GP11 0.0000 OOOO 0.0380 0.9697 


Residual standard error: 0.5539 on 29839 degrees of freedom 

Multiple R-Squared: 0.6559 

F-statistic: 2370 on 24 and 29839 degrees of freedom, the p-value is 0 
1 observations deleted due to missing values 
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LINEAR REGRESSION MODEL FORMULATION 


52D | FULL MODEL LESS MA.POP and un.rate: 
q.52D.Avg.Annl ~ EXECMNGE + FAFOFISH + ADMINSPT + PROFSNL + 
TECHSPT + SVCOTHR + SVCPROT + SALES + CRFTSMAN + LABORERS + 
TRANSPO + MV50GP01 + MV50GP02 + MV50GP03 + MV50GP04 + MV50GP05 + 
MV50GP06 + MV50GP07 + MV50GP08 + MV50GP09 + MV50GP10 + MV50GP11 


Residuals: Min 10 Median 30 Max 
=S3e02. =O. 146) SO20AS4: «O2100.09- 0-66 


Coefficients Value Std.Error t-value Pr(>|/t 


(Intercept) 00.350 0.0046 Te OOO2 0, 0000 
EXECMNGE- -=0,0002 Oi. 00.0 =e OOOD 0.0000 
PAPOR ISH, =0.0002 0.0000 =e IAO2 0000 
ADMINSPT 0.0002 0.0000 10.8476 0.0000 

PROFONL 0.0002 0.0000 ZO OO) 02:0.00:0 
TECHSE LT O:20003 0; 0000 eA OSS) 0.0000 
SoVCOLTHR Ve 00.0 7. 0.0000 TO. :0226 0.0000 
SVEPROT =0.0002 0.0000 25:6266 0.0000 
SALES 0.0002 0.0000 14.9044 0.0000 
CREISMAN ~-@20001 O(0-0:0,0 SO ot AS 0.0000 
LABORERS -0.0010 0.0002 FO. ee) 0.0000 
TRANSPO 0.0000 0.0000 626.173 0.0000 
MYVSOGP OL, =00000 0.0000 SO oe 0.0000 
MV50GP02 0.0000 0.0000 900572 0.0000 
MVSUGPO:3 20008 Oe O0:0 a3 e 9054 0.0000 
MV50GP04 -0.0000 0.0000 VO Ok 0.0000 
MVSUGPOS? -=0.7 00111. 0.0001 Se DOL OnsO0050) 
MVSUGPOG —=0.-0002 0.0000 =Lo sO 229 0.0000 
MV50GP 07 Oe 0O03 0.0002 LeoeZ © OV 229 
MVSOUGPO6- S00 001 0.0000 = 103069 O)..0 0:0 
MVS0GPOS- =0 20001 0.0000 =A Ono 0.0000 
MV SUGE LO" 020012 Os OOO Boao ry ce: On OOO4 
MV50GP11 0.0000 0.0000 0.0630 0.9498 


Residual standard error: 0.5602 on 29842 degrees of freedom 

Multiple R-Squared: 0.648 

F-statistic: 2497 on 22 and 29842 degrees of freedom, the p-value is 0 
1 observations deleted due to missing values 


100 


Z. 
© 
— 
= 
< 
= 
— 
= 
a 
© 
[a 
= 
ha] 
= 
© 
= 
Z. 
© 
— 
DN 
N 
ba] 
= 
eS 
fa] 
= 
as 
< 
Le) 
Z. 
— 
ol 








Cc co 

wv wv 
S = 
LL LL 
od) od) 
Oo oO 
ep) ~Y 
oO oO 
oO oO 
> > 
Q Q 
N N 
Xe) is) 
£ i= 

N Do N Oo 
® ® 

jo) (jo) 

GI Or G 0 
juuy’ Bay’ deg‘ b$zdIZAQVLVOTIV (InN4Beso0/\"qzg'wI)sjenpise. 


52D 


101 


LINEAR REGRESSION MODEL FORMULATION 


74D | FULL MODEL: 
q./4D.Avg.Annl ~ un.rate + MA.POP + EXECMNGE + FAFOFISH 
ADMINSPT - PROFSNE. + ‘THCHSPLE + ‘SVCOTHR. + -SVCPROT + SALES 
CRFTSMAN + LABORERS + TRANSPO + MV50GPO1 + MV50GP02 + MV50GP03 
MV50GP04 + MV50GP05 + MV50GP06 + MV50GP0O07 + MV50GP08 + MV50GP09 
MV50GP10 + MV50GP11 


+++ +4 


Residuals: Min L© Median 30 Max 
4.2. 522 - SOx 156? SO 205232 10.1093" Le 1G 


Coefficients Value Std.Error t-value Pr(>|/t 


(Intercept) Oe Is 0.0104 a ips sO alli 6) 0. 0000 
linw~Prare: Sls si3s Oya eters, —-8.4324 0.0000 
MA.POP O00. 0.0000 ZA LOLZ34 0.0000 
EXECMNGE -0O.0001 0.0000 SL 25612 0.0000 
FAFOFISH -0.0004 020000 Stl ow fh 0:30-0:00 
ADMINSPT 020005 Os OOOO 14.4229 0.0000 
PROFSNL 0. 0.001 0.0000 LO. 4 161 0.0000 
TRCHSe Lr O200.0.5 O00 OO 1 O05 65 C4 C0.0:0 
SVCOTHR 0.0000 0'0.0:0.0 4.1740 0.0000 
SVCPROT 0:000-0 0:00:00 25205 0.6027 
SALES pr erove ml 02 BO00 6.4668 0010 U0 
CRFTSMAN -0O.O0001 0.0000 -13.9898 0.0000 
LABORERS: -=0.0011 «0002 =6..2 1029 0.0000 
TRANSPO 0.0000 0.0000 re cHObe eZ a4) 
MV50GPO1 0:. 0000 0.0000 Sb 2A LS 0.1466 
MV50GP02 0.0001 0%. OCOD ee oO 0.0000 
MV50GP03 0.0009 020000 Cho mieeye ak 0.0000 
MV50GP04 00000 0.0000 =10 S036 0.0000 
MV50GP05 -0.0012 (Corn One 0 ml =P VeC576 0.0000 
MV50GP06 -0O.0001 OL 00.0.0 Sle werss 10 Oia CHONOKG 
MV50GP07 -0.0003 Ox OOOZ -1.5664 Og IVES 
MV50GP08 020.000 0.0000 =6:5 9234 0.0000 
MV50GP09 -0O.0001 Oise 0C:00 -14.4924 0.0000 
MV50GP10 -0.0013 0.0004 =3'2.6.7..0 Keo 
MV50GP11 0.0000 O.. OOOT Cee acre) 0.8607 


Residual standard error: 0.6154 on 29839 degrees of freedom 

Multiple R-Squared: 0.6687 

F-statistic: 2509 on 24 and 29839 degrees of freedom, the p-value is 0 
1 observations deleted due to missing values 
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LINEAR REGRESSION MODEL FORMULATION 


74D | FULL MODEL LESS MA.POP and un.rate: 
q.74D.Avg.Annl ~ EXECMNGE + FAFOFISH + ADMINSPT + PROFSNL + 
TECHSPT + SVCOTHR + SVCPROT + SALES + CRFTSMAN + LABORERS + 
TRANSPO + MV50GP01 + MVS50GP02 + MV50GP03 + MV50GP04 + MV50GP05 + 
MV50GP06 + MV50GP07 + MV50GP08 + MV50GP09 + MV50GP10 + MV50GP11 


Residuals: Min 10 Median a0) Max 
S408 S =O. 55s) =O0.04 59: (Ocoee 14 23 


Coefficients Value Std.Error t-value Pr(>|/t 


(Intercept) Orn. 36.0 0 4!0:0 5: Ae ODO Or 0;0-010 
EXECMNGE -—-0.0002 OO 000 -28.4424 0.0000 
FAFOFISH -0.0003 0.0000 =O452 12 0.0000 
ADMINSPT O20.00:5 0.0000 TA .64:1.5 0.0000 

PROFSNL 0.0003 0.0000 2A 31689 0.0000 
TECHSPT 0.0003 020000 ices Ole Os :0'O:0:0 
SVCOTHR 0 0-0:0:1 Oe OOOO Thelo35 0.0000 
SVEPROT =0:.0001 0.0000 Sy LOSZ 0.0885 
SALES C20C07 O00 OO Treo 2 52 0.00.00 
CRFTSMAN -0.0001 O'0.0:0:0 =i e5o1/ 0.0000 
LABORERS -—0.001L3 0.0002 = 0769 0 ..000:0 
TRANSPO (pr evenere 02 BO00 4.9989 Cea Ov erene 
MV50GP01 -0.0000 0.0000 -4.1921 0.0000 
MV50GP02 0.0000 0.0000 1 @34 36 0.0000 
MV50GP03 0.0009 0.0000 3535627 0.0000 
MV50GP04 -0.0001 0.0000 =9 9278 0.0000 
MV50GP05 -0.0013 Os. OC0Z -8.7616 0.0000 
MV50GP06 -—-0.0002 020000 Ss oon Ohleoto, 0.0000 
MV50GP07 -0.0001 0.0002 =). 3028 0.7620 
MV50GP08 -0.0000 0.0000 =3:. 9972 000-1 
MV50GP09 -0.0001 O00 0:0 =12 .45066 0.0000 
MV50GP10 -0.0012 0.0004 =3. 17680 O40 125 
MV50GP11 020000 0.0001 0.1984 O.84827 


Residual standard error: 0.622 on 29842 degrees of freedom 
Multiple R-Sgquared: 0.6615 
F-statistic: 2650 on 22 and 29842 degrees of freedom, the p-value is 0 
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LINEAR REGRESSION MODEL FORMULATION 


7/F | FULL MODEL: 
q.//7F.Avg.Annl ~ un.rate + MA.POP + EXECMNGE + FAFOFISH 
ADMINSPT + PROFSNL + TERCHSPT + SVCOTHR + SVCPROT + SALES 
CRETSMAN + LABORERS + TRANSPO + MV50GPO1 + MV50GP02 + MV50GP03 
MV50GP04 + MV50GP05 + MV50GP06 + MV50GP07 + MV50GP08 + MV50GP09 
MV50GP10 + MV50GP11 


+++ + 


Residuals: Min LO) Median 30) Max 
6.618 6: =O el. =0.05605: -Oalsot 1.9426 


Coefficients Value Std.Error t-value Pr(>|/t 


(Intercept) (CRamIe GhS) Oe. Oo! 9.9687 0.0000 
Unerare: =b.4504 0.1964 =iooae 0.0000 
MA.POP pr evene nl Oy OO OU 20';4.93'0 0.0000 
EXECMNGE -0O.0001 0.0000 ales eee An Anllagy, 0.0000 
FAFOFISH -0Q.0005 0.0000 Si eo Oa 0.0000 
ADMINSPT O00 US 0.0000 Le 28eo5 0.0000 
PROFSNL O20-0'0. 4. OL OO O0 7.3996 Oh OOO 
TECHSPT 0.0006 0.0000 20.9363 0.0000 
SVCOTHR O.0 0:0. 00000 6424 79 0%0000 
SVCPROT 0.10002 Oe 0C.00 Ac De 0.0000 
SALES 0.0001 Ok OO Se OOS 0.0004 
CRFTSMAN -0.0002 O00 OF =5i20 Jo 020.000 
LABORERS -0.0017 0.0002 aes NOS 0.0000 
TRANSPO 0.0000 0.0000 ono nT es 0.70:9:74 
MV50GPO1 020000 Te0C.00 O24 56 On ca ar a 
MV50GP02 0:07 O50 J. 020000 ote ol Oe ey, O:0:000 
MV50GP03 O00 172 0.00.00 AO = Oe 0.0000 
MV50GP04 0.0000 0.0000 —-8.6634 0.0000 
MV50GP05 -0.0015 O00 07 =6 4.1599 Oi. OO. 
MV50GP06 —-0.0002 O:0.0:0:0 -14.0749 0.0000 
MV50GP07 -0.0007 020,002 =3. 489 00016 
MV50GP08 (Gp Crepere 0.0000 6.1784 040000 
MV50GP09 -0O.0001 0.0000 Si 1659 0.0000 
MV50GP10 -0.0014 O00 05 as Weel Bo Po ko (Gian Oe hay, 
MV50GP11 0.0000 0.0001 Ucaovs 0.6116 


Residual standard error: 0.7759 on 29839 degrees of freedom 

Multiple R-Squared: 0.6811 

F-statistic: 2656 on 24 and 29839 degrees of freedom, the p-value is 0 
1 observations deleted due to missing values 
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LINEAR REGRESSION MODEL FORMULATION 


77F | FULL MODEL LESS MA.POP and un.rate: 
q.77F.Avg.Annl ~ EXECMNGE + FAFOFISH + ADMINSPT + PROFSNL + 
TECHGPT + <SVCOTHR. + SVCPROT 4 —SATESG 4 <CRFTSMAN + TABORERS. 4 
TRANSPO + MV50GP01 + MVS50GP02 + MV50GP03 + MV50GP04 + MV50GP05 + 
MV50GP06 + MV50GPO07 + MV50GP08 + MV50GP09 + MV50GP10 + MV50GP11 


Residuals: Min 10 Median 30) Max 
=6.4,06:. =O, 1696 =O 0516.0 124-4: 19 ...33 


Coefficients Value Std.Error t-value Pr(>|t 
(Intercept) 0.0446 0.0064 6.9823 0.0000 
EXECMNGE -0.0002 OO 00.0 2214/1956 0.0000 
FAFOFISH -0.0004 0.0000 -9.6641 0.0000 
ADMINSPT O..0:0:0:5 0.0000 19.0766 0.0000 
PROFSNL O:.0 002 0.0000 PO2AZ OS 0.0000 
TECHSPT 0.0005 020000 16.5052 Os: 0'O:0:0 
SVCOTHR O.0-0:0:1 Os OOOO 12 6605 0.0000 
SVCPROT Ow 0:00 1 0.0000 2340 3:1 O.J016.3 
SALES C2000 2 O00 OO 8.9922 0.00.00 
CRFTSMAN -0.0002 O'0.0:0:0 See Dk 0.0000 
LABORERS -0.0018 0.0002 =] e595 0.0000 
TRANSPO pr evenere 02 BO00 4.9229 0: 0000 
MV50GP01 -0.0000 0.0000 -1.9400 0.0524 
MV50GP02 0.0000 0.0000 eos 0.0000 
MV50GP03 O.0:09,3 0.0000 40.5416 0.0000 
MV50GP04 -0.0001 0.0000 =6 0393 0.0000 
MV50GP05 -0.0017 Os. OC0Z Oa OL 0.0000 
MV50GP06 -—-0.0003 020000 aot omy oa Ero 0.0000 
MV50GPO07 -0.0005 0.0002 =7 .0505 0.0403 
MV50GP08 0.0001 0.0000 S420 S 0.0000 
MV50GP09 -0.0001 OL OC 0:8 -9.9397 0.0 O00 
MV50GP10 -0.0013 0x OOS =2. 7207 0.0065 
MV50GP11 C2 0.007. 0.0001 O.52.135 0.259380 


Residual standard error: 0.782 on 29842 degrees of freedom 
Multiple R-Sgquared: 0.6761 
F-statistic: 2831 on 22 and 29842 degrees of freedom, the p-value is 0 
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LINEAR REGRESSION MODEL FORMULATION 


S8M | FULL MODEL: 
q.88M.Avg.Annl ~ un.rate + MA.POP + EXECMNGE + FAFOFISH 
ADMINGSPT “<- PROFSNL ¢ “TRCHSPT 4 “SVCOTHR. 4 -SVCPROT: 4 SALES 
CRFTSMAN + LABORERS + TRANSPO + MV50GP01 + MV50GP02 + MV50GP03 
MV50GP04 + MV50GP05 + MV50GP06 + MV50GP07 + MV50GP08 + MV50GP09 
MV50GP10 + MVS50GP11 


+++ +4 


Residuals: Min 10 Median 30 Max 
Ono or Ooh OS tS SOOT ® (el 3A: Lo. 


Coefficients Value Std.Error t-value Pr(>|t 


(Intercept) 0.1344 O03 5 9.9459 0:00:00 
lin. rate —=—1.4912 22026 Sea O29 0.0000 
MA.POP O20 00s: 0.0000 19.602 5 0.0000 
EXECMNGE -0O.0O001 0.0000 HG... aa 0.0000 
FAFOFISH -0.0005 020000 —-12.6640 0:30:00: 
ADMINSPT 020005 Oe OOO 19.567 / 0.0000 
PROFSNL 0.0001 0.0000 Tw 957 0.0000 
TRCHSPE:r OOO 7 00:0 0:0 ZO IZA Gin OHeROte 
SVCOTHR 0.0001 0.0000 6.7995 0.0000 
SVCPROT 0: 0002 0.0000 4.5486 0.0000 
SALES 0000 Os 00:00 229086 O00 36 
CRFTSMAN -—-0.0002 0.0000 =15;6859 0.0000 
LABORERS. -=00017 0002 =) VZO5 0.0000 
TRANSPO 0.0000 0.0000 LL. 86/9 0.0618 
MV50GPO1 0:. 0000 0...00:0.0 250-20 O62 
MV50GP02 0.0001 0.0000 8.6356 0.0000 
MV50GP03 02-0013 020000 40.6920 0.0000 
MV50GP04 0.0000 0.0000 —-8.4558 0.0000 
MV50GP05 -0.0016 Oe OOO72 =¢°. L412 0.0000 
MV50GP06 -0.0002 OPN ONOROKE —-14.0040 0. 0.000 
MV50GP07 —-0.0008 Os COO2 =3 4417 0.0017 
MV50GP08 0.0000 0.0000 Tole 0.0000 
MV50GP09 -0O.0001 0300.00 S62 74. 0.0000 
MV50GP10 —-0.0015 Oe OO O.S =3:. 1066 0.0019 
MV50GP11 040.00 Oe OOO 0.5411 0.5884 


Residual standard error: 0.8012 on 29839 degrees of freedom 

Multiple R-Squared: 0.6812 

F-statistic: 2656 on 24 and 29839 degrees of freedom, the p-value is 0 
1 observations deleted due to missing values 
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LINEAR REGRESSION MODEL FORMULATION 


S8M | FULL MODEL LESS MA.POP and un.rate: 
q.88M.Avg.Annl ~ EXECMNGE + FAFOFISH + ADMINSPT + PROFSNL + 
TECHSPT + SVCOTHR + SVCPROT + SALES + CRFITSMAN + LABORERS + 
TRANSPO + MV50GP01 + MVS50GP02 + MV50GP03 + MV50GP04 + MV50GP05 + 
MV50GP06 + MV50GP07 + MV50GP08 + MV50GP09 + MV50GP10 + MV50GP11 


Residuals: Min 10 Median 30) Max 
=O252 =05.1.953 =06 0529-1 27S Loe 82 


Coefficients Value Std.Error t-value Pr(>|/t 


(Intercept) 0.0461 0 i0.0°62/ ioe eh OpIBe) Ox 0000 
EXECMNGE -0.0002 O..0 00:0 221724865 0.0000 
FAFOFISH -0O.0004 0.0000 =9 8691 0.0000 
ADMINSPT O..0:0:0:5 0.0000 19 2/640 0.0000 

PROFSNL O:.0 002 0.0000 16.294 7-0.5 0.0000 
TECHSPT 0.0005 020000 16.2517 9 0:30-0:00 
SVCOTHR O.0-0:0:1 Os OOOO Teo Z2 1 0.0000 
SVCPROT Ow 6:00 0.0000 ZOO LS Ome OO ay 
SALES 00-004 O00 OO Su. 05) 0400.00 
CRFTSMAN -0.0002 O'0.0:0:0 =1.5: 6876 0.0000 
LABORERS -0.0018 0.0002 =o OS 0 ..000:0 
TRANSPO pr evenere 02 BO00 5.20305 O00 00 
MV50GP01 -0.0000 0.0000 Sato 56 Opap eres) 
MV50GP02 0.0000 0.0000 ead 0.0000 
MV50GP03 O.0:09,3 0.0000 41.0663 0.0000 
MV50GP04 -0.0001 0.0000 —-7.8656 0.0000 
MV50GP05 -0.0017 Os. O0C0Z 6.893635 0.0000 
MV50GP06 -—-0.0003 020000 =16.1294 0.0000 
MV50GPO07 -0.0005 0.0003 -2.0636 OOS or 
MV50GP08 0.0001 20000 9217402 0.0000 
MV50GP09 -0.0001 Oe CHENONG = O40 O10 0. 0:00:0 
MV50GP10 -0.0013 0% O05 -2.7094 0.0067 
MV50GP11 C2 0.007. 0.0001 0.5620 O25 74 


Residual standard error: 0.807 on 29842 degrees of freedom 
Multiple R-Squared: 0.6764 
F-statistic: 2835 on 22 and 29842 degrees of freedom, the p-value is 0 
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LINEAR REGRESSION MODEL FORMULATION 
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LINEAR REGRESSION MODEL FORMULATION 


95B | FULL MODEL: 
q.95B.Avg.Annl ~ un.rate + MA.POP + EXECMNGE + FAFOFISH 
ADMINGPT “<- PROFPSNL = “TRCHSPT 4 “SVCOTHR. 4+ -SVCPROT 4 SALES 
CRFTSMAN + LABORERS + TRANSPO + MV50GP01 + MV50GP02 + MV50GP03 
MV50GP04 + MV50GP05 + MV50GP06 + MV50GP07 + MV50GP08 + MV50GP09 
MV50GP10 + MVS50GP11 


+++ +4 


Residuals: Min 10 Median 30 Max 
=o Oa? SOI B22 sO sO 562 1) Od 2 eS. Ly 2? 


Coefficients Value Std.Error t-value Pr(>|t 


(Intercept) Ole Oi, O23 as Case ile, 0:00:00 
lin«<rate -=L.4365 0.1842 -7.8089 0.0000 
MA.POP 020001 0.0000 2g DO) 0.0000 
EXECMNGE -0O.O0O001 0.0000 =1 748912 0.0000 
FAFOFISH -0.0005 020000 S240 0 0:30:00: 
ADMINSPT 0.0004 Oe O00 il eek Ales) 0.0000 
PROFSNL 0.0001 0.0000 7.9669 0.0000 
TRCHSPE:r 0.0006 020:0 0:0 ZL eA3ts Gin OeROHe 
SVCOTHR 0.0001 0.0000 errosiew, 0.0000 
SVCPROT O:.,0001 0.0000 345906 0.0004 
SALES O/0:0'011. Os 00:00 4.2069 O:-0000 
CRFTSMAN -—-0.0002 0.0000 -14.9493 0.0000 
LABORERS. -=0.0015 0.0002 =6. 1:6 LO 0.0000 
TRANSPO 0.0000 0.0000 1.8494 0.0644 
MV50GPO1 0:. 0000 0.0000 =i SZ 0.7924 
MV50GP02 0.0001 0.0000 Die 72359 0.0000 
MV50GP03 Oy -O.0L 7. 020000 37.9182 0.0000 
MV50GP04 0.0000 0.0000 =9 <4: 716 0.0000 
MV50GP05 -0.0013 De OOO72 =/,4695 0.0000 
MV50GP06 -0.0002 OPN ONOROKE =o cO LO 0.0000 
MV50GP07 -0.0006 Os COO2 -2.6619 04:00 7-8 
MV50GP08 0.0000 0.0000 0.7910 0.4290 
MV50GP09 -0O.0001 0300.00 S172 32948 0.0000 
MV50GP10 -0.0014 0.0004 =345024 0.0010 
MV50GP11 0.0:0'O:0 0.0001 0.4765 0.6339 


Residual standard error: 0.7278 on 29839 degrees of freedom 

Multiple R-Squared: 0.679 

F-statistic: 2630 on 24 and 29839 degrees of freedom, the p-value is 0 
1 observations deleted due to missing values 
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LINEAR REGRESSION MODEL FORMULATION 


95B | FULL MODEL LESS MA.POP and un.rate: 
q.95B.Avg.Annl ~ EXECMNGE + FAFOFISH + ADMINSPT + PROFSNL + 
TECHSPT + SVCOTHR + SVCPROT + SALES + CRFTSMAN + LABORERS + 
TRANSPO + MV50GP01 + MV50GP02 + MV50GP03 + MV50GP04 + MV50GP05 + 
MV50GP06 + MV50GP07 + MV50GP08 + MV50GP09 + MV50GP10 + MV50GP11 


Residuals: Min 10 Median 30) Max 
SO 0p OS HO OA oe Ae 2 LO Se OS 


Coefficients Value Std.Error t-value Pr(>|t 


(Intercept) 0:04.13 0.0060 cMrer ollie: 0.0000 
EXECMNGE. =0.0002 Oi. 00.0 =o OO D5 0.0000 
PAPOF ISH =0.0004 0.0000 =e oan 0000 
ADMINSPT 0.0004 0.0000 Le? LOS 0.0000 

PROFSNL 0.0002 0.0000 ZO a Ore 02:0.00:0 
TECHSP T O:20005 0; 0000 LG uZooo 0.0000 
SVCOTHR 00.07 0.0000 LZ Do e84 0.0000 
SVCPROT e000 T 0.0000 LyAOwZ U1 3.63 
SALES 0.0002 0.0000 LOS 0.0000 
CRFTSMAN -0.0002 O(0-0:0,0 EZ OD 0.0000 
LABORERS -0.0017 0.0002 TO Oe 0.0000 
TRANSPO 0.0000 0.0000 De 53468 0.0000 
MV SOUGE Od.  S00000 0.0000 = U'0054 
MV50GP02 0.0000 0.0000 Soke sore: O'000:0 
MVSUGPOS 20011 Oe OC OZ 5633030 0.0000 
MV50GP04 -0.0001 0.0000 OOO 0.0000 
MVOUGE OS —=U./0019 0.0002 aio eer ropl OO0:050 
MVSUGPOG, =0.000Z2 0.0000 =L6219.90 00000 
MVSO0GPO7) =020003 0.0002 1A O33 Ome bap 
MV50GP08 0.0000 0.0000 SivAo se U.0 00.5 
MVSUGPOD: =O 20001. 0.0000 =O. O22 0.0000 
MV SUGE LO: Oe 0OUwe2 0.0006 SL Oot. O.0043 
MV50GP11 0: G001 0% 0007 Ou4959 0.6200 


Residual standard error: 0.7343 on 29842 degrees of freedom 
Multiple R-Squared: 0.6731 
F-statistic: 2793 on 22 and 29842 degrees of freedom, the p-value is 0 
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LINEAR REGRESSION MODEL FORMULATION 
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